Why do we need normalization of quantitative features?

U

un1t2016-02-06 19:26:42

Python

un1t, 2016-02-06 19:26:42

I'm going through the tutorial here mlbootcamp.ru/article/tutorial
First, the data is prepared for the form when it can be fed to the algorithm. Everything was clear up to this point:

Feature normalization
Many machine learning algorithms are sensitive to data scaling. Such algorithms, for example, include the nearest neighbor method, support vector machines, etc.
In this case, it is useful to normalize quantitative features. This can be done in a variety of ways. For example, each quantitative feature will be reduced to zero mean and single standard deviation:

data_numerical = data[numerical_columns]
data_numerical = (data_numerical - data_numerical.mean()) / data_numerical.std()
data_numerical.describe()

I would like to understand what is the physical meaning of this normalization.
Can you popularly explain or throw some links?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Andrey Druzhaev, 2016-02-06
@un1t

The meaning is very simple. I'll try to explain with an example.
Metric algorithms draw their conclusions based on an estimate of the distance between points or between a point and a line. Suppose we have two variables, one changes from 0 to 100, the second - from 0 to 1.
Let's take two points - (0, 0) and (100, 1). The distance between them according to the Euclidean metric:
((100 - 0)**2 + (1 - 0)**2) ** 0.5.
It can be seen that the distance estimate is formed mainly due to only 1 variable. It follows that the values of 2 variables will have little effect on the final result of the algorithm. And this will only be due to the fact that the data is not normalized. And not with the fact that the second variable does not affect the result of the essence of the problem.