How does this detail work in gradient descent?

D

Dplll2018-02-11 13:12:11

Python

Dplll, 2018-02-11 13:12:11

# cделать шаг градиента
def step(v,direction, step_size):
   """Двигаться с шаговым размером step_size в направлении от v"""
     return [v_i + step_size*direction_i  for v_i, direction_i in zip(v, direction)]

How does it work? direction in our case is a gradient. Why subtract the product of the partial derivative and the step size from v_i?
Question 2: Why is the data shuffled in stochastic descent?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

N

Nicholas, 2018-02-11
@adelshin23

1 - Draw a picture for the 1D case. The gradient will turn into a normal derivative. The derivative is the slope of the tangent to our cost function. It turns out that at this step we look at whether the function increases or not (the derivative is greater or less) and, depending on this, we shift to the side where the function is smaller by the step size (hence the name - gradient descent, we descend to the minimum of the function using the gradient as a direction). For the multidimensional case, everything is the same, we essentially do this for each variable.
2 - not to hit the local minimum

I

ivodopyanov, 2018-02-12
@ivodopyanov

2 - in part of the initial data, there may be some additional structure that will really lead us to a local minimum. For example, if some N examples in a row have approximately the same input or output data, then the model will learn that there is only data of this nature in principle, and it will be more difficult for it to “part” with this knowledge when learning on the following examples.