Why does the error rate keep jumping while training the network?

A

Anton Savchenko2017-05-30 09:24:40

Python

Anton Savchenko, 2017-05-30 09:24:40

Why does the training error rate behave like this?
There is a neural network with three layers that takes pictures from the mnist database as input. Accordingly, there are 784 neurons in the input layer, one for each pixel. There are 30 neurons in the hidden layer, ten on the output, one for each class. On the hidden layer, tanh is used as the activation function, and softmax is used on the output layer. The cross entropy is used as the loss function. The training set has 60,000 pictures. I train the network using the stochastic gradient descent method, a sample of 5, 10, 100 (it doesn’t matter, the result as a whole does not change) random elements of the training set.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Sergey Sokolov, 2017-05-30
@sergiks

Too big step in gradient descent. Try reducing it by half.