A
A
Alexander Ivanov2019-03-12 14:19:46
Python
Alexander Ivanov, 2019-03-12 14:19:46

Where to store vectors (embendings) for quick comparison?

Actually the question is, where to store the vectors of faces created by facenet, consisting of 512 vertices, so that they can be quickly compared. The vector is an array, initially they tried to store it in postgresql, the field type is cube - but the search speed is not satisfactory. There are 50k vectors - and for them the comparison takes one and a half minutes, this is a very long time. Then we tried to store vectors in json, the speed increased, the search began to take 12 seconds, but it's still very long. It takes a lot of time to convert the string to an array, so the question is, where to store the vector, so that python would immediately receive an array. Or maybe you would recommend some other method.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
Vladimir Olohtonov, 2019-03-12
@sgjurano

Here is an article about methods for approximate search for nearest neighbors:
https://m.habr.com/ru/company/mailru/blog/338360/
In short, a search by the HNSW index of 500k vectors will fit in 5 milliseconds. The library is better to take faiss, it is more decently written than the original nmslib. Both have Python bindings.

S
Sergey Tikhonov, 2019-03-12
@tumbler

50k * 512 * 8 * 3 = 600MB
Try to store in RAM, numpy arrays. Plus, there is a KD Tree structure that allows you to search for the nearest neighbors of a vector in a K-dimensional linear space. Speed ​​up your search.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question