Answer the question
In order to leave comments, you need to log in
Why do we need normalization of quantitative features?
I'm going through the tutorial here mlbootcamp.ru/article/tutorial
First, the data is prepared for the form when it can be fed to the algorithm. Everything was clear up to this point:
Feature normalization
Many machine learning algorithms are sensitive to data scaling. Such algorithms, for example, include the nearest neighbor method, support vector machines, etc.
In this case, it is useful to normalize quantitative features. This can be done in a variety of ways. For example, each quantitative feature will be reduced to zero mean and single standard deviation:
data_numerical = data[numerical_columns]
data_numerical = (data_numerical - data_numerical.mean()) / data_numerical.std()
data_numerical.describe()
Answer the question
In order to leave comments, you need to log in
The meaning is very simple. I'll try to explain with an example.
Metric algorithms draw their conclusions based on an estimate of the distance between points or between a point and a line. Suppose we have two variables, one changes from 0 to 100, the second - from 0 to 1.
Let's take two points - (0, 0) and (100, 1). The distance between them according to the Euclidean metric:
((100 - 0)**2 + (1 - 0)**2) ** 0.5.
It can be seen that the distance estimate is formed mainly due to only 1 variable. It follows that the values of 2 variables will have little effect on the final result of the algorithm. And this will only be due to the fact that the data is not normalized. And not with the fact that the second variable does not affect the result of the essence of the problem.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question