Answer the question
In order to leave comments, you need to log in
How to solve the problem of finding duplicates from 1 billion images?
I run the image through GoogleNet, take the last layer loss3 / classifier
, how to look further for this?
options:
Answer the question
In order to leave comments, you need to log in
faiss published a 1 billion image search article. there with clustering and everything is quite interesting.
But it turned out to be easiest to use lsh hashing and cassandra.
1) I haven’t read specifically about this network, but the task of “building a space suitable for searching for similar objects” and “recognizing a class of objects” are different tasks that require different learning processes;
2) the dataset search task is a separate task that requires preliminary network training in order to receive embeddings;
3) if you need to search by static dataset, then HNSW is a great option, but the index will be built for a couple of weeks; if the dataset is dynamic, then faiss hasn't come up with anything yet, as far as I know.
When using faiss, I got the following results: the index is cooked for 5 hours, it takes 67 gigs, here is an estimate of the quality of the search on model data (BigANN, SIFT), index type IVF262k_HNSW32,PQ64:
[email protected]1 [email protected]10 [email protected]100 time (ms/query)
nprobe=16,efSearch=128,ht=246 0.6546 0.8006 0.8006 4.231
nprobe=32,efSearch=128,ht=246 0.7107 0.8818 0.8818 7.783
nprobe=64,efSearch=128,ht=246 0.7435 0.9343 0.9346 14.691
nprobe=128,efSearch=128,ht=246 0.7653 0.9687 0.9692 28.326
nprobe=256,efSearch=128,ht=246 0.7726 0.9829 0.9834 55.375
How about simplifying?
So in the total commander there is a search for duplicates (by size first ... and then with the same size by content)
well, there is another option .. immediately read the checksum of the file and then just enter in Excel and the delete duplicates button + in the query master access, query on the difference between a table of a complete list and a table without duplicates
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question