W
W
WinconeCoder2020-12-06 20:42:48
Python
WinconeCoder, 2020-12-06 20:42:48

How to find the smallest euclidean distance in the database?

We have one array:

[-0.14760402  0.18719167  0.04150053 -0.0921068  -0.16887078  0.02324709
 -0.04641612 -0.11630221  0.12370145 -0.07014455  0.24573359  0.03904079
 -0.27129778  0.02959204 -0.07295675  0.13609795 -0.13187698 -0.08968797
 -0.2012838  -0.16615982  0.0217059   0.00041627 -0.01858913  0.04955338
 -0.19024704 -0.18434721 -0.06982315 -0.05426939  0.15494351 -0.10451394
 -0.04299729 -0.07180098 -0.1690695  -0.08467829 -0.02749946 -0.00308757
 -0.01261706 -0.14929104  0.19307789 -0.05911519 -0.11811744 -0.00415748
  0.11933747  0.22268862  0.15321049 -0.04101308  0.12349229 -0.06126648
  0.09076166 -0.18876649  0.06199023  0.07674868  0.09749875  0.0716019
  0.12201017 -0.19290529 -0.01353829  0.21687341 -0.11010585  0.12630889
  0.04276971  0.00293659 -0.10534084 -0.08965943  0.12599346  0.09744062
 -0.07307861 -0.21842727  0.18325034 -0.15839542 -0.05597069  0.09004606
 -0.089193   -0.11996002 -0.3178705   0.05085036  0.38763958  0.19086862
 -0.14595589  0.02082828 -0.01978934 -0.07487711  0.05899249 -0.01674179
 -0.11979252 -0.02747795 -0.08680353  0.08834615  0.25229415  0.01445329
  0.00105722  0.23681179  0.08904405 -0.09763616  0.01047772  0.13714562
 -0.15587574 -0.02978589 -0.03372395  0.0463781   0.15200463 -0.13603318
  0.02152741  0.10868748 -0.1757534   0.06744438 -0.0318573   0.01892542
 -0.0262807   0.04844372 -0.15358914 -0.05309738  0.22137821 -0.20882189
  0.1269407   0.24391431 -0.02090507  0.02553483  0.11997325  0.03864738
  0.07777584  0.04860133 -0.09686276 -0.13455498  0.01073158 -0.15386821
  0.02141182  0.0842296 ]


It is a set of floating point numbers from -1 to 1.

We save it to the database.

We generate 2 array:
[-0.18303205  0.07142412  0.01505323 -0.14673074 -0.17361142  0.02646742
 -0.03395279 -0.06831738  0.08645248 -0.07506754  0.22879875  0.00961941
 -0.21720028 -0.10564265  0.09235184  0.12250473 -0.09408396 -0.07704318
 -0.17983794 -0.10640886 -0.00760378  0.05881869  0.00609878  0.05506223
 -0.17767537 -0.30160218 -0.03009136  0.00688204 -0.02562195 -0.02175772
 -0.05757777  0.13162929 -0.16575503 -0.06576031  0.01825872  0.09892717
 -0.0990886  -0.04563086  0.16782466  0.00385343 -0.1655371   0.02370039
  0.06626146  0.26851994  0.25856623 -0.0427581   0.02065502 -0.03230648
  0.16015874 -0.30130655  0.06620336  0.1480027   0.08171275  0.02545679
  0.0743178  -0.09076802  0.00253361  0.18856071 -0.1644216  -0.03282868
 -0.00277766 -0.10307762 -0.12198099 -0.19316623  0.18896221  0.17105345
 -0.14985694 -0.1295868   0.08943356 -0.06235509 -0.10340865  0.04370517
 -0.19336724 -0.18820783 -0.37302011  0.09345791  0.33356905  0.1724074
 -0.21920988  0.01767435 -0.02863912 -0.01240025  0.08466867  0.0631162
 -0.08921493 -0.05865925 -0.02537591  0.08397133  0.18167101 -0.01622136
 -0.05787662  0.27105331  0.12237041  0.04605677  0.03525903  0.05793242
 -0.03242231 -0.06490842 -0.16594143  0.0506063   0.07158192 -0.08893429
  0.03126835  0.12600896 -0.18564469  0.23052448  0.03958503  0.06578501
  0.01276818 -0.08363014 -0.08138505 -0.0220259   0.09097432 -0.2337397
  0.10124329  0.1670257   0.05445097  0.14788289  0.12662688 -0.00762837
 -0.00578771  0.04282037 -0.15355538 -0.03583786  0.16012162 -0.03436102
  0.05533902  0.09013008]


Now we need to find among the base the array with the smallest Euclidean distance for the array above.

It is calculated in this way: And for these arrays it will be equal to: 0.17664886547843417 Or, in other words, the similarity of arrays by 17% Purpose: to find the largest percentage among the base of such arrays. MongoDB allows you to save an array and index it later. Through what and how best to implement it? Can you simplify it somehow?
dist = 1 - numpy.linalg.norm(a - b)





Answer the question

In order to leave comments, you need to log in

1 answer(s)
I
Ilya Flakin, 2020-12-07
@WinconeCoder

In monge, this cannot be implemented without comparing the vector with each entry in the database.
To search by vectors, for example , faiss is used

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question