Answer the question
In order to leave comments, you need to log in
How to properly prepare data for network training?
Hello!
In the process of studying neural networks, a lot of questions have accumulated, the answers to which probably require more practical knowledge than theoretical ones.
Most of the articles describe standard examples (for example, the classification of a cat or a dog) or the equipment for the work of neurons, but there are practically no real examples of applied tasks used on user data and not ready-made datasets.
Hence the problem arises that after reading the article there are more questions than answers, therefore, if someone has practical experience that he is ready to share, I will be very glad to receive information on the following questions.
Answer the question
In order to leave comments, you need to log in
1) From situation to situation. It cannot be specifically said that 10,000 million data will be better than one. To do this, they usually make statistics on changes in the algorithm and correct decisions. And on the charts it will be possible to determine the behavior of the line. If you think about and specifically answer your question, then imagine a graph that improves with the amount of data and at the end of the graph followed exactly the same sequence of improvements. From this we can conclude that it is worth adding even more data to a better result.
More data = not always better. Again, the graphs show deterioration.
2) glasses are enough, but it is better with people, because other elements will be taken into account (for example, nose, eyes, mouth)
3) There are a lot of problems in many aspects of ML and this is one of them)) But I kind of heard that there are already
well-trained models (though I don’t think they are free),
4) It doesn’t matter, here we are talking specifically about: Is it a dog This? Yes or no. If not. Is it a cat? Yes or no. And the better the model for a cat and a dog, the more accurate the results will be, but never hope for 1.00))
1. The point is not only in the volume of the dataset, but also in its completeness. When the network is built incorrectly, nothing is usually learned at all. The insufficient size of the layers is easy to notice by simply increasing it and running the training again. In terms of volume - I saw somewhere about text classification that SVM gives the best result with a sample size somewhere from 2000 to 50000, neural networks - from 50000.
3. This is called the interpretability of the neural network / algorithm. There are some studies in this direction (how to ensure that by looking at the activation of neurons to understand what and why the NS does), but there seem to be no serious decisions. The best way is a good test suite that will show error clusters.
4. Everything strongly depends on the complexity of each specific task. There is no ready formula.
Most of the articles describe standard examples (for example, the classification of a cat or a dog) or the equipment for the work of neurons, but there are practically no real examples of applied tasks used on user data and not ready-made datasets.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question