Approach (algorithms, etc.) to make a search by parameters like on Yandex Market?

blackstone2010-12-20 16:39:12

MongoDB

blackstone, 2010-12-20 16:39:12

Hello!
I want to ask experienced people how to do a universal search in the product database like on Yandex Market.
I understand that the task is not easy, but I would like to try to at least get closer to the idea of creating a universal filter.
Those. we have different types of products with different characteristics - these can be:
- numerical characteristics (ranges, etc.);
- multiple choice from directories;
- single choice from directories;
- different units of measurement;
— other types of criteria.
Quite interesting was the introductory topic about the appropriate use for such a task MongoDB MongoDB or how to stop loving SQL- there just an example of Yandex-Market was given, but the topic was not fully disclosed.
Also tell me - maybe I swung at an idea that is too complicated, which one person cannot implement in a reasonable amount of time? Then you have to simplify the task and not pursue a universal solution?
Perhaps I'm not the first to think about this?

Answer the question

In order to leave comments, you need to log in

9 answer(s)

Alexander, 2010-12-20
@blackstone

Also tell me - maybe I swung at an idea that is too complicated, which one person cannot implement in a reasonable amount of time?

start with a minimum, then you will understand what is missing!
the main thing is to start, you will finalize in the process. I also simplified the search to a (reasonable) minimum.

Iskander Giniyatullin, 2010-12-20
@rednaxi

In general, your task, as I understand it, is the so-called. facet search. en.wikipedia.org/wiki/Faceted_search
When I faced such a problem, I solved it in this way: there is a table of "products" in which all the products are.
There is a specs guide.
There is a table in which there are triplets id of the product - id of the characteristic from the reference book - value
All this base is indexed by Sphinx. Then it is done approximately as described in the article: habrahabr.ru/blogs/sphinx/64318/
Those. first, for example, the user makes a search query "samsung phones". With one query to the database, we display all phones, and using grouping by feature id, we get all the features that are possible for these products: i.e. for example, the screen diagonal, the operating system, and then we select for each characteristic the possible values that match the request. Thanks to multi-requests, such a search works quite quickly.
Then, accordingly, the user is prompted to select characteristics from the list of possible ones. Well, in general, the whole algorithm. It is implemented in a reasonable time, it works quite quickly, there are no problems with adding products to the table.

Antelle, 2010-12-20
@Antelle

I advise you to read about spatial indexes and their well-known implementations, for example, multidimensional R-trees. They can also be useful in solving this problem if, for example, you do a search in a range of parameters.

Anton, 2010-12-20
@conturov

I did this:
Table 1: Product
Table 2: Categories for the product
Table 3: Parameter names
Table 4: Linking Parameters to categories (When choosing a category for a product, I load the corresponding parameters)
Table 5: Linking - Parameter value with a specific product (id, id_item , id_param, var_param)
Thus, we assign parameters to a category, when adding a product, select a category and get a set of parameters for this category.
What I like about it:
1) Any nesting of a category with its own parameters.
2) You can make a dynamic search form in which you can choose options depending on the category.
3) Easy small requests.

antonlustin, 2010-12-20
@antonlustin

and than generation of request depending on the chosen filters does not arrange?

bagyr, 2010-12-20
@bagyr

The task, it seems, fits well with relational databases, you definitely don’t need to look at benchmarks with a wild number of inserts. It is also possible to fasten some automation of attribute filters. Fundamental difficulties are not immediately visible, the question is rather on a scale, but it will be very difficult to implement it alone to the end.

Alexander, 2010-12-20
@akalend

I was developing a similar system, a second similar project is under development.
for now I use Sphinx for search - an article about search in the process of writing (in another week it will be ready).
The main ideas on the storage structure:
- there are goods (specifications)
- there are offers (data from stores)
- there are models - there is a
catalog
All offers from stores are tied to specifications and / or models.
A robot is launched that analyzes the names of goods and their belonging to categories and binds them to models. Depending on how these robots work, a good index will be built, and, accordingly, a high-quality search.
And don’t forget that in YM there are a bunch of content managers who edit “product specifications” and greatly facilitate the “life of robots”.
The main ideas of the search:
- we build an index according to the catalog, names, models,
- we do it on request for each index.
- depending on the results obtained, we draw the appropriate conclusion
- we analyze the relative position of the search words
- the parameters go through a full-text search.
To be honest, I wanted to make this project my own search engine, but I don’t have enough strength. First you need to run with the Sphinx.
I support the idea of MongoDb as a means of storing information and quickly retrieving it. With the use of MongoDb, there may be a small problem (except for the data size limit of 2 GB for 32 bit Os ), which will result in a big hassle: until the Sphinx indexer is written for it. I was also going to use it, but for now I settled on the muscle.
At the last PHPConf there was a good report about using the Sphinx to search for a million products dostavka.ru (I have an aggregator site - there were 2 million) There is a video on the PHPConf website

ZOXEXIVO, 2017-03-03
@ZOXEXIVO

MongoDB 3.4 supports faceted search