S
S
Stanislav Martynov2020-11-04 21:13:50
MongoDB
Stanislav Martynov, 2020-11-04 21:13:50

Is it possible to do a quick map lookup with 1 million markers (MongoDb) and clustering?

Good time. There is a base (MongoDb) with 1 million points for the map. The client has a map that displays these markers. Borders of the viewed area of ​​the client + zoom arrive at the server.
1. You need to cluster this data on the server
2. You need to do it as quickly as possible
For a long time I've been trying to figure out what can be done with this and so far there is no answer.
To understand the approximate picture:
1. The size
of the database is 1 million, it is filled with 20-30k new data every week, i.e. in theory, in a year or two, it can reach 2-3 million.
2. Users
There are 200-600 users on the site every hour who access the map every time they move.
3. Speed
Ideally, you can do without filtering data by type A < B < C, so that the query would simply target the desired area of ​​the map. It would be cool to reach 0.005s per request (if it's realistic).

I am writing a question because I am trying different options, and it will take a lot of time, maybe someone has come across a similar one and will protect it from a rake and a lot of wasted time.

What I tried:
1. I tried to implement QuadTree, but as I understand it, it works well only if everything is kept in memory, otherwise I did not figure out how to implement QuadTree for MongoDb. It works out in 0.005s, but you need to keep the entire collection of markers in memory.
2. I tried to assign QuadKey to each marker. Those. divide the map into quadrants, into 4 parts, and so on up to 23 levels of depth, QuadKey refers to its tile on the map, and in theory, if you do a search by QuadKey + grouping, you get excellent clustering, but the request takes 0.8s at a large zoom type 14- 17, and 1-2c on small when many objects are in scope. I tried to play with it in different ways, look for substrings in QuadKey, convert QuadKey to uint and make requests like A < B < C. But it all came down to 0.8s to 3s. All this, taking into account indexing in monge.
3. I tried to make a 2d index in monge by the position of the markers, and do a search through $geoWithin, but such a request completes in 0.8-0.9s.

The card works on web sockets, it would be cool not to upload megabytes of data to the client, so clustering is needed.
In the implementation of the 3 described examples, I could make a mistake and perhaps a higher speed can be achieved.

Questions themselves:
Maybe someone faced a similar task and knows how best to cluster the markers? Which algorithm is more efficient?
How to implement a search for 1 million markers so that it takes as little time as possible?
Does it make sense to make a cache here?
Maybe someone knows how services like Yandex.maps, google.maps work on the server side?
Maybe there are ready-made solutions for C#?

I would be grateful for any link, personal experience or useful information

Answer the question

In order to leave comments, you need to log in

3 answer(s)
M
mayton2019, 2020-11-04
@mayton2019

Almost all geo-search engines use either a quad-tree or an R-tree to store geometry. Why Mongo is here is not clear at all. This is a database of a different type. For documents.
Take trees and use. Not enough memory - well, solve it with fast disks or just buy more nodes for parallel searches.

V
Vladimir Korotenko, 2020-11-04
@firedragon

1 million markers let them take a kilobyte for every
gigabyte of memory? Let it be a clumsy javascript and it gives an overhead x3
3 giga
Let there be an extension of 3 times.
9 gigs
Not a very big server with 16 gigs of RAM.
This is if in the forehead. If you put some kind of Redis, then the memory will be reduced to 3 gigs in the hardest version.

R
Roman Mirilaczvili, 2020-11-05
@2ord

Try PostGIS for data storage. In it, these algorithms are already implemented and the data is obtained using ordinary SQL queries.
For speed, try to build a table of correspondences between the input parameters and the area you are looking for. For example, to determine the longitude, latitude and then you can quickly get the desired area. Caching the coordinates of frequently requested areas will speed up even more.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question