How to handle null values of categorical variables in regression (machine learning)?

C

Chichi2015-12-04 22:00:00

Machine learning

Chichi, 2015-12-04 22:00:00

I am trying to do a regression analysis. There are many variables (multiple feature regression). Some variables for some data element have not been assigned a value and are set to null. For ordinal variables, I can use replacing Nulls with the average of the data. But what if the variable is categorical. For example color or area of the city. For clarity, the picture:

In this data example, there are several categorical variables: color, material, security, type, area. How to replace nulls in such data. Or should I take this same Null as a separate data variant (class) and that's it? Wouldn't that be too primitive?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

P

protven, 2015-12-05
@ChicoId

Is there no explicit correlation by which the missing data can be restored?
If not, then I would try to check the distribution of the available values for normality. And if this condition is met, then NULL would be filled with random values with a normal distribution. Well, as a standard, I would run the resulting model on a test sample.