Answer the question
In order to leave comments, you need to log in
How to handle null values of categorical variables in regression (machine learning)?
I am trying to do a regression analysis. There are many variables (multiple feature regression). Some variables for some data element have not been assigned a value and are set to null. For ordinal variables, I can use replacing Nulls with the average of the data. But what if the variable is categorical. For example color or area of the city. For clarity, the picture:
In this data example, there are several categorical variables: color, material, security, type, area. How to replace nulls in such data. Or should I take this same Null as a separate data variant (class) and that's it? Wouldn't that be too primitive?
Answer the question
In order to leave comments, you need to log in
Is there no explicit correlation by which the missing data can be restored?
If not, then I would try to check the distribution of the available values for normality. And if this condition is met, then NULL would be filled with random values with a normal distribution. Well, as a standard, I would run the resulting model on a test sample.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question