Answer the question
In order to leave comments, you need to log in
How does this detail work in gradient descent?
# cделать шаг градиента
def step(v,direction, step_size):
"""Двигаться с шаговым размером step_size в направлении от v"""
return [v_i + step_size*direction_i for v_i, direction_i in zip(v, direction)]
Answer the question
In order to leave comments, you need to log in
1 - Draw a picture for the 1D case. The gradient will turn into a normal derivative. The derivative is the slope of the tangent to our cost function. It turns out that at this step we look at whether the function increases or not (the derivative is greater or less) and, depending on this, we shift to the side where the function is smaller by the step size (hence the name - gradient descent, we descend to the minimum of the function using the gradient as a direction). For the multidimensional case, everything is the same, we essentially do this for each variable.
2 - not to hit the local minimum
2 - in part of the initial data, there may be some additional structure that will really lead us to a local minimum. For example, if some N examples in a row have approximately the same input or output data, then the model will learn that there is only data of this nature in principle, and it will be more difficult for it to “part” with this knowledge when learning on the following examples.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question