Answer the question
In order to leave comments, you need to log in
Neural network for classifying extra small images?
Hello, I need to write a neural network for conditional image classification. Conditional because images consist of a square of 20x20 , or 15x15, or 10x10 pixels.
I was able to make a small square classifier up to 15x15 pixels. I now need to compare these squares, that is, there are two squares at the input, and 2 neurons at the output (it seems or not). Prior to this, the maximum of the first layer was 300 neurons, and now it is from 800 to 200 (depending on the selected squares). I didn’t use convolutional neural networks and somehow I didn’t want to use them now (that is, I want to do it on a classical neural network)
My question is whether this approach can be used or will it still be necessary to use a convolutional neural network?
Which of the sizes would be best (10x10, 15x15, 20x20) in your opinion? (example images )
How many layers and how many neurons in each layer should be used? (before that I selected experimentally)
I usually trained several neural networks (3) for each RGB channel, and then I got the result by the electoral college. Should I do the same now?
Thank you in advance for your response.
Answer the question
In order to leave comments, you need to log in
There is no universal answer, do it this way, it's better. So keep experimenting and find what works best for your data.
Neural networks are not primarily about the networks themselves, but about the data on the basis of which they are trained.
The more data covers the cases, the better, you can even generate additional images based on the available ones.
The nature of the generation determines how these images are obtained, for example, whether they have noise (then generate them by creating hundreds from one image), whether distortions are possible (sizes, rotations, should this be taken into account when comparing), etc.
The more pairs of images there are (including all combinations of all available), the more likely the network will 'understand' what exactly determines the similarity.
About RGB, and the nature of the changes within which you need to consider images similar color is involved? If not, why not get rid of it altogether? Or vice versa, send all three channels to one network at once, it will figure it out. You can also replace RGB with HSL (it is closer to human perception and its understanding of similarity).
About layers, usually the first layer is a classifier, they even suggest first looking for a classifier and then taking its network as part of a future network that solves the problem, as a way to speed up the search. Subsequent layers determine the complexity of the problem
ps try the network variant where one output is a similarity score... from an implementation point of view there is no difference, but for the network it may be easier than looking for two binary outputs
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question