C
C
Chichi2015-12-04 22:00:00
Machine learning
Chichi, 2015-12-04 22:00:00

How to handle null values ​​of categorical variables in regression (machine learning)?

I am trying to do a regression analysis. There are many variables (multiple feature regression). Some variables for some data element have not been assigned a value and are set to null. For ordinal variables, I can use replacing Nulls with the average of the data. But what if the variable is categorical. For example color or area of ​​the city. For clarity, the picture:
c98c138f35c14ba9bd449bfbb083203b.jpg
In this data example, there are several categorical variables: color, material, security, type, area. How to replace nulls in such data. Or should I take this same Null as a separate data variant (class) and that's it? Wouldn't that be too primitive?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
P
protven, 2015-12-05
@ChicoId

Is there no explicit correlation by which the missing data can be restored?
If not, then I would try to check the distribution of the available values ​​for normality. And if this condition is met, then NULL would be filled with random values ​​with a normal distribution. Well, as a standard, I would run the resulting model on a test sample.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question