H
H
HaruAtari2013-12-29 10:34:15
Algorithms
HaruAtari, 2013-12-29 10:34:15

Compare images and search for similar ones?

Good morning.
There is a site with the user's ability to upload images to a shared gallery. There are quite a lot of images and there is a need to check for the presence of such images in the catalog.
How I plan to do this: when saving an image, calculate its hash, then look for similar hashes in the database. If there are none, save the image and its hash.
Here a number of questions arise:
1. What are the ready-made solutions for calculating such hashes that work on Linux.
2. Is it enough to compare using one algorithm , or would it be advisable to store and compare several hashes?
3. To search for similar images, it is necessary to search not by full match, but by "deviation" of hashes from each other.Is it possible to organize such a search using a database (POSTGRESQL) so as not to pull out the entire record every time and not process it in a PL?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
V
Vitaly Zheltyakov, 2013-12-29
@VitaZheltyakov

The algorithm is correct, but it can be accelerated several times - analyze not the entire image, but its central part. The dimensions of the part should depend on the dimensions of the image in order to avoid duplicate images of different resolutions.

S
Seed122, 2013-12-29
@Seed122

Wrote a dissertation on this topic. First of all, read this post here:
habrahabr.ru/post/120562
For your task, the described algorithm is best suited. But he determines the similarity only by the shape of the image and does not take color into account. If you need to take into account the color, try to isolate the dominant colors ( habrahabr.ru/post/136530 ) and come up with an algorithm for calculating hashes and determining the similarity for these colors.

Y
YoungSkipper, 2014-12-30
@YoungSkipper

If a ready-made solution is www.phash.org/, on osx it is installed via brew install phash under linux, I think the same. Actually, the advantage of this solution is that there is a difference between hashes, the difference between images according to this algorithm. If postgres understands queries like - give me all the records where this field differs from this number by no more than such a constant - then you can receive on postgres.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question