A
A
Alex Raven2017-05-29 06:29:19
PHP
Alex Raven, 2017-05-29 06:29:19

How would you solve the problem of searching within a given radius?

In general, the task is slightly non-trivial. Site on WordPress. The user specifies his address, from which we get Latitude, Longitude coordinates using the Geocoding API. In the illustration, the user is listed under index 0, green marker. There are services (for example, lawn mowing, dog walking, pool cleaning, cleaning, etc.), let's call them Services. And there are some organizations that provide these services. Moreover, one organization can provide several services at the same time (for example, mowing lawns and walking dogs, and another - cleaning pools and cleaning). Let's call them Hubs.
And then we look for all the Hubs that can serve these coordinates. Services and Hubs are in our database as custom posts. Hubs has a post meta with latitude, longitude. Basically, the task is not difficult. But it is complicated by the fact that each Hub has its own radius, which it can serve. For example, 10, 20, 50 kilometers and so on. For each Hub can be unique, as well as its coordinates. In the illustration, the service radius is shown as colored circles.
Here you can see that user 0 falls within the service area of ​​Hubs 1 and 2. Hub 3 is about the same distance from 0 as is 2. But its service radius is smaller, so we filter it out of the results. And then from the Hubs that serve this address, we select the one closest to the user. As a result, it turns out that Hub 1 suits us best, since the user is in its service area and it is clearly closer than 2.
0cd711c47ed0445fbd9eac1c1551d0bb.png
In the future (not yet implemented), the algorithm is planned to be upgraded. For example, the service radius of the service is 10 km, but they will also go 20 km, but for double the price.
The algorithm is working. The question is the following. Is it possible to somehow simplify it and reduce the number of iterations? After all, if there are tens of thousands of Hubs, then iterations over everything from the database will take time and load the server, even taking into account caching. In addition, the function is called by ajax after the user marks the next service of interest to him, that is, many times for each user.

function get_closest_hub($service_id, $latitude, $longitude)
{
    // получим все сервисы, которые есть в базе данных
    // поскольку хаб может предоставлять более одного сервиса, мы кэшируем все сервисы в $this->hubs, чтобы уменьшить количество запросов к базе данных. в теории это может вызвать переполнение памяти, то есть это костыль =)
    if (!is_array($this->hubs))
    {
        $hubs = get_posts([ 'post_type' => 'hubs', 'post_status' => 'publish', 'posts_per_page' => -1 ]);
        $this->hubs = $hubs;
    }
    else
    {
        $hubs = $this->hubs;
    }
    $selected = [];
    // возьмем все доступные хабы
    foreach ($hubs as $hub)
    {
        // координаты хаба у нас хитро закодированы в массиве
        $address = get_post_meta($hub->ID, 'hub_coords', true);
        $distance = $this->distance($address['lat'], $address['lng'], $latitude, $longitude);
        // а данные о предоставляемых сервисах - как сериализованный массив в post_content
        $data = unserialize($hub->post_content);
        $good = false;
        $radius = floatval(get_post_meta($hub->ID, 'hub_radius', true));
        $in_range = ($distance <= $radius);
        // если пользователь находится в зоне обслуживания и у хаба определен хоть один сервис
        if ($in_range && is_array($data))
        {
            foreach ($data['active'] as $key=>$value)
            {
                // данный хаб предоставляет сервис $service_id и пользователь находится в радиусе его обслуживания
                if ($key == $service_id && $value == '1')
                {
                    $good = true;
                }
            }
        }
        // если хаб предоставляет хотя бы один из выбранных пользователем сервисов
        if ($good)
        {
            // хаб нам подходит, сохраняем его в массиве
            $hub->distance = $distance;
            $selected[] = $hub;
        }
    }
    if (is_array($selected) && count($selected)>0)
    {
        // сортируем по расстоянию, от меньшего к большему
        usort($selected, [$this, 'distance_cmp']);
        reset($selected);
        // и выбираем первый, то есть ближайший
        $hub = current($selected);
        // получаем множитель цены, для каждого хаба он может быть свой, например: 1.25
        $hub->km_rate = floatval(get_post_meta($hub->ID, 'hub_rate', true));
        if ($hub->km_rate == 0)
        {
            // теоретически он никогда не должен быть равен нулю, но мало ли что...
            $hub->km_rate = 1;
        }
        return $hub;
    }
    // ни одного хаба не найдено
    return false;
}
// сравнение расстояний в двух объектах
function distance_cmp($a, $b)
{
    if ($a->distance == $b->distance)
    {
        return 0;
    }
    return ($a->distance < $b->distance) ? -1 : 1;
}
// Haversine formula
function distance($latitudeFrom, $longitudeFrom, $latitudeTo, $longitudeTo, $earthRadius = 6371)
{
    $latFrom = deg2rad($latitudeFrom);
    $lonFrom = deg2rad($longitudeFrom);
    $latTo = deg2rad($latitudeTo);
    $lonTo = deg2rad($longitudeTo);
    $latDelta = $latTo - $latFrom;
    $lonDelta = $lonTo - $lonFrom;
    $angle = 2 * asin(sqrt(pow(sin($latDelta / 2), 2) + cos($latFrom) * cos($latTo) * pow(sin($lonDelta / 2), 2)));
    return $angle * $earthRadius;
}

PS I understand that the code turned out to be not at all optimal, but when the deadline was 2 weeks ago and you haven't slept for 30 hours, it doesn't work any other way...

Answer the question

In order to leave comments, you need to log in

4 answer(s)
R
Rastishka, 2017-05-29
@Rastishka

I would do something like this:
To speed up the selection, we first look in the database for the bounding box (stupidly, the entry of a dot into the square of each hub).
Then, using the found hubs, we calculate a more accurate distance. If you have a scale - cities (and not countries / continents), you can not bother with the "correct" formula, but use the stupid distance between points (school geometry 5th grade). But this is a controversial optimization.
For complex cases, where distance is less important than service spectrum overlap, I would introduce multipliers, calculate the relevance for each point, and display from the most relevant to the least relevant.

X
xmoonlight, 2017-05-29
@xmoonlight

Task : find the minimum distance between the coordinates of point 0 and the center of any of the hubs, where the radius is greater than or equal to this distance.
Stored procedure (inside the database) :
input parameters: M,N,inRange
procedure task: print N-records starting from the Mth result, sorted by ascending hub distances from point 0, where the hub radii are greater than or equal (inRange=true) the distance between the center of the hub and point 0.
The same procedure can then be used for those hubs that do not reach point 0 with their scope simply by setting the inRange parameter to false.
-------------
1. First, you need to sort all the hubs by distance relative to point 0.
2. Then remove those where the radius has become less than the distance to point 0 - that is, the missing service area. (Let's save the "tail clipping" index in case we don't find it)
3. Of the remaining ones, starting from the top (the closest hub to point 0) - search by any criteria.
4. If there are none, and the missing zones along the radius (AFTER the index of paragraph 2) will be able to expand their radius for an additional. money - then we expand it taking into account the distance and the surcharge, already counting 2 parameters: the presence of point 0 in the radius zone and the cost of the surcharge. We stop when the desired minimum price is reached or until the hubs run out (to find the MOST minimum price).

F
freeExec, 2017-05-29
@freeExec

Use a geospatial database. Where the request will be entirely in the type database:
Select all hubs with the required type and located in the maximum radius (the same 50 km, or actually find MAX from all hubs) from the user, and using the database index. Build a buffer from the found hubs - their given radius and select those that the user has hit.
Those. two nested SELECT-a, not a php code footcloth.

A
Alex Raven, 2017-05-29
@alexraven

It seems to be logical, there is only one problem. The project is on MySQL and I will never get the go-ahead from the client to remake. He simply will not agree to allocate money for this.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question