A
A
ADv1S2016-10-25 17:00:47
PostgreSQL
ADv1S, 2016-10-25 17:00:47

How to organize the architecture for keyword subscription (as an example, Avito)?

The situation is this:
There is:
1) a Postgres database that stores user subscriptions of ~50,000 requests. Requests of the form: Cars, Moscow, mileage up to 100 thousand, in the description there is "winter tires", "in native paint".
2) ElasticSearch cluster, in which new documents appear frequently - announcements. Up to 500 per minute.
Task: organize mailing of new ads as in avito, auto.ru.
What's happening now:
~500 new ads are loaded per minute, we put them in ElasticSearch, and once a minute we start the process of checking which ad suits which user. Those. I form 50,000 requests to ElasticSearch with the given id-shniks of new records (because user requests contain full text, which ES deals with), and slowly, 500 at a time, I execute them on the ElasticSearch cluster. Some ads can be filtered on the backend, for example, if the region is not suitable, but there are still a lot of requests. Accordingly, the elastic hangs up from so many full-text queries, and the search on the site starts to slow down terribly.
The type of requests is conditionally as follows:
1) Find among 500 new ads those in which the phrase "white color" occurs, in the city of Kazan
2) Find among 500 new announcements those that contain "winter tires" or "full power package", in the city of Moscow
........
and there are 49000 more. Do
you have any ideas how to better organize such a solution with a mailing list? Or who will share their experience, how the decision-making system works, whether the document is suitable for the user or not, with giants like avito, auto.ru?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
X
xmoonlight, 2016-10-26
@ADv1S

1. Select entities (nouns) from the query: you can use this
2. Check against a pre-prepared dictionary of synonyms and unify everything that has inaccuracies and is a synonym.
3. We bind the tags of the current ad to the GENERAL list of ad tags of the entire system.
4. Under the ad, we display only 5-6 tags, with the maximum number of ads inside each tag (number of ads linked to this tag) throughout the system.
5. In the user's queue for sending - we place the ID of the ads according to his subscription: tags, etc.
6. As soon as the pool of new ads exceeds the threshold value, we send out a newsletter. For example, every 30 new ones from the user's total queue:

if($newItemsForUser>=30) {
   /* 
      команда запроса инициализации рассылки
      например, команда через API микросервису
   */
}

D
Dimonchik, 2016-10-26
@dimonchik2013

sphinxsearch.com/blog/2013/06/21/faceted-search-wi...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question