The neural network "refrains". How to treat?

S

StrangeAttractor2013-01-09 21:15:10

Neural networks

StrangeAttractor, 2013-01-09 21:15:10

- What is the probability that when you leave the house, you will meet a real dinosaur in the yard?
- 50%: either meeting or not meeting.

Made a neural network in Encog 3 Workbench. The architecture is the usual "Feedforward neural network". 89 inputs, 60 neurons in the first hidden layer, 31 in the second hidden layer (the number of neurons in the hidden layers was chosen by simple averaging, maybe it should be different - I will be grateful for advice), 2 outputs. Tried activation functions as ActivationTANH (from -1 to 1) and ActivationSigmoid (from 0 to 1). I teach using the Resilient Propagation method.

The correct answers (desired levels for the two output neurons) for the most part cluster around two levels.

As a result, this cunning contagion learns to produce approximately the average between the two most probable (which is a fact independent of the input data) levels, instead of choosing the correct answer that is closer to one or the other (after this, there is no progress in learning).

I tried to simplify the task from an attempt to calculate the value of a characteristic to a logical assignment to a particular group, rounding all the answers in the training data to 0 or 1 - it still gives out answers of about 0.5 in this case.

What am I doing wrong? Does this mean that, based on the chosen input data, it is in principle impossible to automatically judge the value of the desired ones, or should it just be done differently?

By and large, I do not need the perfect accuracy of each answer. Let it be better ten or even twenty times out of a hundred, the network will give out deeply erroneous answers, which will then be processed in a different way, but in the remaining cases it is necessary that it confidently determine at least the probability of belonging to a cluster of values.

Thanks in advance.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

K

kmike, 2013-01-10
@StrangeAttractor

It would be nice to check that no errors were made during training (whether the input data is mixed up, whether they are normalized, etc.). For example, if the data is not normalized, and some insignificant variable takes on large values, then it turns out that the neural network will mainly take into account this variable, and the result will depend almost only on it (and since in fact the result from it does not depends, then the network can "roll" to 0.5 or some other value to reduce the error).
Further, you write that you “rounded values to 0 or 1” - are your “value clusters” < 0.5 and > 0.5? “A bunch of about 2 levels” - about 0 and 1, or about some others? What is the meaning of the output variables, does the level really matter there? well, there x1=0.1, x2=0.3, x3=1 and is it possible to say that in some sense x3 > x2 > x1, and that “a little more, and x2 would become x1”? If in fact there is not a continuous input variable, then you can try to replace it with several logical ones (“probability of belonging to clusters of values”) - in the training data they will be 0 or 1, because these for the test data know whether the value belongs to the cluster or not (is it the same as what you did or not?).
The network parameters (number of neurons, activation functions) can be selected in such a way that the cross-validation error is reduced (checking at the end on the unused data set that overfitting by the network parameters did not happen), but these are already details.

F

Fyodor, 2013-01-10
@Richard_Ferlow

How to immerse yourself in it so that it does not seem like notes in an incomprehensible language, it is not clear what?

_

_ _, 2013-01-10
@AMar4enko

I'm not familiar with the functionality of Encog 3 Workbench, but here are a couple of tricks that I used in my time:

for the sigmoid, I normalized the input and output data to the range in which the sigmoid has a non-zero derivative (for example, 0.1 - 0.9 for the output and the corresponding ones for the input).
my network structure was not strictly deterministic initially. I took one hidden layer, added a neuron, connected it to all input neurons and all output neurons, and trained. If he could not achieve the desired error, he added another layer to this layer, also completely binding. Determined a certain maximum number of neurons in the layer - when this number of neurons in the hidden layer was reached and a large error was added, another layer was added and everything was new. As soon as an acceptable error was reached, I thinned the network, removing links whose weight changes had little effect on the result.