Neural network, what should be the range of weights?

A

Atlant772018-05-18 23:48:52

Neural networks

Atlant77, 2018-05-18 23:48:52

Hello, I am studying a neural network, such questions arose, the input parameters should be normalized in the range from 0 to 1 or from -1 to 1, and what range of numbers should the weights have? I downloaded some library, looked in it, the weights in any scenario go in the range from 0 to 1.
Another question about the weights, as I understand it, the output value of the neuron should also be in the range between 0 and 1, but if there are a lot of neurons, let's say 1000 ( one neuron receives 1000 parameters, let's say from the previous layer), then the output value will always go off scale and drop to more or less sane only when the weights have values like 0.00.... i.e. starts with one thousandth, am I doing something wrong or is it normal? just how many iterations should pass so that the weights from 0.1 .. go down to 0.0001 .. let's say.
You can only please in simple language, on the fingers. Thanks a lot.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

I

ivodopyanov, 2018-05-21
@ivodopyanov

The type of input and output values depends on what meaning you put into them and on the architecture of the network (in particular, the activation functions at the outputs of neurons).
When processing text, for example, a sequence of id's of words in a sentence is often given as input - respectively, these are integers from 0 to <number of words in the dictionary>.
In image processing, the ReLU activation function is often used, the output of which is non-negative numbers.
Normalization at the input is useful when initially the range of features is very different from each other, but they are approximately equal in meaning, and the features themselves are real numbers (for example, if the input data is the length of an icicle on the roof in millimeters and the temperature outside in degrees ; the first feature has the order of hundreds-thousands, the second - tens-units).
The way the weights are initialized in the layers makes a big difference to how well the backprop will work. But this area has already been studied quite well, and standard solutions are used everywhere by default, such as initialization according to Glorot or orthogonal initialization. So there is no need to worry.
"how many iterations must pass for the weights to go down from 0.1.. to 0.0001.." can be rephrased as "why backprop is slow and how to speed it up". This is generally one of the fundamental tasks in DL. Weight initialization is one way to partially solve. Various activation functions - different. New layer architectures - third. Training data modification is the fourth. Etc.

A

Arseny Kravchenko, 2018-05-20
@Arseny_Info

cs231n.github.io/neural-networks-2/#init

�

⚡ Kotobotov ⚡, 2018-05-21
@angrySCV

maybe I will open America for you, but neither the size of the scales, nor the range, nor the normalization of these weights have absolutely no meaning.
What matters is the decision function that fits ANY weight values based on backpropagation to those values for which the decision function responds with the fewest errors. And what exactly the values of the weights are - absolutely no difference, even from 0.01 to 0.02 (with a step of 0.0000001), or for example from -1000000000 to +10000000000, the result will be the same (adjusting the weight to the required reaction of the decisive function).
As for normalization, this is generally a meaningless operation, for example, you divide, for example, the value of the "incoming signal" from all neurons by the number of neurons (and this value is always a constant). And the constant has absolutely no effect on the process of fitting the coefficient (just the coefficient itself will, for example, be more or less by this constant), but as I said, we are NOT interested in the absolute value of the coefficient, we are concerned about the interaction of the coefficient and the decision function.
I hope the idea is clear.
write your own neural network, try to manually calculate the coefficients, take everything away yourself.