Finding nearest points in multidimensional space?

S

savinov_ao2012-10-31 17:09:37

PostgreSQL

savinov_ao, 2012-10-31 17:09:37

The task is as follows: there is an n-dimensional space (n somewhere in the region from 11 to 31), it has k points (about 100.000), points are rarely added, rarely removed. It is necessary to answer queries like "find all points at a Cartesian distance less than L from a given point P(x1,x2,...,xn)". Many people recommend PostgresSQL as a DBMS, use SP_GiSP indexes (written by talented people from Moscow State University), but I have not found specific examples of such a task anywhere. In the standard postgres, I did not find a built-in type that can be used as a store of a point in a multidimensional space. The cube type, as far as I understand, needs to be delivered as an extension (it hasn’t worked out yet) and it’s not at all a fact that SP_GiSP and GiSP indexes can support it at all. I met with postgres absolutely for the first time, compared to mysql it seems like a dense forest. Can anyone help with specific advice? I'm on the right track, is the cube type suitable for my task? Maybe there are some ready-made solutions, implementation examples? ..

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

G

gleb_kudr, 2012-10-31
@gleb_kudr

It all depends on how often you need to operate on the data.
Offhand:
1. If the data rarely changes, but is often requested, it is better to make an index by the distance between the edges.
This is n * (n-1) / 2 records (the number of edges of the graph with a given number of nodes), that is, about 5 billion records. A perfectly reasonable number.
2. If the data not only rarely changes, but is also rarely requested, it is quite possible not to make an index at all and count everything on the fly, discarding the left results. You need a powerful machine + optimizations for speed (and this, by the way, is perfectly parallel on the GPU). But in this case, everything can be kept in memory and the database is not needed at all.

S

savinov_ao, 2012-10-31
@savinov_ao

As the most universal solution, in which the things I need are most likely already implemented. Are there separate ready-made daemons that perform my task? To write the kd tree itself and filtering by distance, I honestly don’t have a desire ...

A

Alexander Korotkov, 2012-11-02
@smagen

1) SP-GiST and GiST are spelled correctly.
2) cube supports GiST indexes
www.postgresql.org/docs/9.2/static/cube.html
but does not support search by distance (you can search for entry into a rectangular area, and then additional filtering). Implementation is not a problem, no one needed it badly enough. SP-GiST for cube has not yet been implemented either.
3) In general, if you have experience programming in C and a desire to understand, then everything can be done.

M

Michael, 2012-11-02
@mtyurin

FUNCTION 8 must be added for gist for cube. you can see an example in extension btree_gist

A

AnnInDark, 2012-11-22
@AnnInDark

Very belatedly, I can offer a clumsy and mathematical solution: in order not to load the database with unnecessary types and additions, write the coordinates directly into the database and read them.
For example, make a table "space" in which the id of the point and the 2nd column is an array of its coordinates.
For example, for "space3"
1 {1,2,6}
2 {5,0,8}
...
well, consider the distance as mathematics through the function)