How to implement comparison or image recognition?

V

Viktor Koreysha2014-06-09 19:27:36

linux

Viktor Koreysha, 2014-06-09 19:27:36

There are many 7px by 14px pictures. Each picture has a black number on a white background. The same numbers are outwardly the same, that is, a person cannot distinguish 9 in one picture from 9 in another with an eye. Only 10 different outwardly pictures. These pictures were obtained in some obviously automatic way. The task is to understand which picture is which number.
The first thing I tried was calculating the hash. It, unfortunately, apparently due to some kind of compression for pictures with the same number is different.
Then the idea arose to count black, for example, pixels and their number in a certain range will indicate that the number is needed. I dug into the ImageMagik manuals for a long time, but did not understand how you can calculate the number of pixels of a certain color.
Please tell me how to do this or where can I read about how to do this. Or perhaps there is an easier solution? Maybe some simple Linux recognizers?
I am using php. And the pictures are pulled from the site using curl. Perfect accuracy is not required.
I would be very grateful for any ideas/tips.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

L

lnked, 2014-06-09
@Iktash

if there is a given width and height, you can try to calculate 2 - 3 points for each digit and compare by these points, let's say the 9 will have 10x12 black, 20x14 white and others do not and compare all the pictures by the color of this point

M

maxxxixxxx, 2014-06-09
@maxxxixxxx

php.ru/forum/viewtopic.php?t=40513
Recognition of numbers
The simplest thing is to write a program that will paint over all non-white colors in black, then compare pixel by pixel with templates and consider the highest percentage of matches as an answer.

S

Sergey, 2014-06-09
Protko @Fesor

According to the hashes, of course, nothing will come of it - although the pictures are the same, it’s not a fact that they are the same.
Your second idea is already closer to adequate solutions. Essentially you need to calculate the histogram for the image (the number of white pixels in each column). Counting pixels is easy - you just need to go through each pixel of the image, determine its color (white or black) and increment the counters in case of white.
For each digit, these histograms will be different, and between seemingly similar images, they will be very similar.
Then you can simply remember all the histograms, and when recognizing, consider its histogram for the image at the input, calculate the discrepancy (how much the values differ in each column, there are many options. It’s better to look for something ready-made, there are enough such things on the network) and just select the one variant, the difference with which is smaller.
This method can be classified as "dumb".
Neurons (in particular, it will be easiest to do this using a multilayer perceptron, the source codes for which implementations are lying around in the network in sufficient volume) are just as good, but for learning you need at least some sample and also a test sample of numbers. If you have enough of them (at least 5 pieces per digit) and the same number of test images (they should be somehow different), then no problem. The algorithm there is quite simple, although the volume of theory can plunge into slight horror.

X

xandox, 2014-06-10
@xandox

The easiest way is to average the pictures for each digit (I1 + I2 + ... + In) / n
Then cross-correlate (sum(I1(x, y) * I2(x, y))/sum(I1(x, y ))*sum(I2(x,y))) with the averaged value you compare with the image being checked and which figure the cross-correlation is greater - then most likely this is it. But it is understandable that sometimes it can be wrong.
It is better to convert from 8 bits per channel to float per channel before processing