S
S
Silm2016-08-23 16:55:20
linux
Silm, 2016-08-23 16:55:20

How to implement a search for similar photos on the server?

There is a service where users can upload photos to the server. When uploading a photo, it is required to check if there is already the same or very similar photo on the server.
With a high probability there should be duplicate images with slight changes: resizing, tilting, color changes, adding / removing small elements. It is known about the https://github.com/jedisct1/libpuzzle
library , it is not clear whether and how it can be used to create an index and search. Looking for other libraries/off-the-shelf tools, not necessarily in/for php. The main thing is that we need the very methodology of indexing and subsequent search in the database.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
SharuPoNemnogu, 2016-08-23
@SharuPoNemnogu

from this article I wrote a class based on discrete cosine transforms, which calculated the hash, stored it in the database and calculated the Hamming distance, on the basis of which the degree of similarity was determined. The algorithm is not fast, though, but changing the color, resizing and a slight slope, in principle, is not a hindrance. For puff like there is an extension pHash

X
xmoonlight, 2016-08-23
@xmoonlight

The search is done using unique parts (curvature of a straight line with an angle: 15-165 degrees inside one such part, if not found - expand the radius, if found - cut off this zone from further search) on the general image and their location relative to each other.
All contour pieces are rotated according to one algorithm: for example, the maximum number of points along a line-radius (from center to edge) is always down. If they are repeated - the maximum angle between such repetitions is always on the left.
Each such piece is a separate hash. A bunch of such pieces is a "tree" of hashes.
When searching, we compare the hashes of the desired and the piece from the database with another through the Hamming distance and look for dependencies along the tree of the sequence that most matches them.

U
un1t, 2016-08-23
@un1t

blog.iconfinder.com/detecting-duplicate-images-usi...
There are ready made algorithms for any language like dhash for example.

D
Dimonchik, 2016-08-23
@dimonchik2013

phash - simple, but, alas, it doesn’t work when pruning
; networks, trained incl. ImageNet, but it is many times more expensive, the server is certainly not one, or one, but 40 cores

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question