V
V
Vitaly2015-07-15 14:09:31
Java
Vitaly, 2015-07-15 14:09:31

How to correctly calculate geographic distances in highly loaded services?

There is a service with users and their location. Let's say I need to get all users within a radius of 500m from the current user.
And there is a problem that arises when a large number of users appear. In order for me to get a list of users within a radius of 500m, I need to perform the following steps: get all users from the database, then calculate the distance from the current user (the one who requested this list) to each user, filter out all those who are further than 500m and return the remaining . From here, the output of the time that needs to be spent on calculating the distance will grow in arithmetic progression.
I did a little research (growth of calculation time depending on the number of users from 1,000 to 1,000,000, in steps of 10,000).
1e7f818e9a214592ae93b83d3883e0a4.png
And as you can see from the graph, 358 ms is quite a lot, given that all of them can request it at the same time.
Maybe this can be solved in another way (the function of calculating the distance is the smallest to I can not)?
There was an idea to do this: to break the entire surface of the Earth into squares. Then determine which square the current user is in and calculate the distance only to those users who are only in this square.
But here another problem arises: what if users are in different but adjacent squares near the border of these squares, then they will no longer see each other?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
M
mrrl, 2015-07-15
@silverhawk90

Maybe calculate the distance only to users who are in this or adjacent squares? You can limit yourself to 4 squares - located around the vertex closest to the user.

D
digital smile, 2015-07-15
@DigitalSmile

Highly loaded projects are always a compromise between speed and data integrity. Therefore, you must first decide whether the situation is critical for business logic when several users do not see each other or not. By answering this question, you can make either a more reliable or faster solution.
On the project, we carried out the logic for determining if a zone hit the base, but to be honest, we did not test it in terms of speed. If you have time and you use MySql, it would be interesting to know how fast this calculation works.
ACOS(SIN(#{userLat})*e.SIN_LATITUDE + COS(#{userLat})*e.COS_LATITUDE*COS(#{userLong} - e.RADIANS_LONGITUDE)) * 6371000 <= #{targetRadius}
SIN_LATITUDE = SIN(RADIANS(#{latitude})) , COS_LATITUDE = COS(RADIANS(#{latitude})), RADIANS_LONGITUDE = RADIANS(#{longitude}) were calculated in advance for each object before entering into the database.

A
AI Lab, 2015-07-15
@vpuhoff

it is possible to store in the database, in addition to the "real" coordinates, "approximate", with an error of, say, 1 degree, that is, weed out the entire fractional part. Further, from the database, select only those users whose coordinatelat-1<lat<lat+1 и для второй координаты аналогичное условие, полученную выборку можно уже честно отсеивать по точному расстоянию.

S
Sergey, 2015-07-15
@begemot_sun

Найти ближайшую точку в системе координат?

Вячеслав, 2015-07-24
@vkulakov

Думаю, правильно будет использовать расширение к базе данных для работы с геоданными. Для MySQL ничего подходящего не знаю, для Postgresql есть PostGIS, с помощью которого вашу задачу можно будет очень легко решить. Если возможности полностью перейти на Postgresql нет, то можно, например, вынести туда хранение текущих координат пользователей и поиск по ним.
У меня в одном из проектов основная база - это MySQL, а PostGIS как дополнительная для работы с картой.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question