How to design a search engine architecture?

S

Sergey Grigorov2018-07-21 04:53:53

Search engines

Sergey Grigorov, 2018-07-21 04:53:53

Hello.
To begin with, I would like to briefly describe how the search engine works.
1. There is a certain request.
2. The request is subjected to the stemming procedure.
3. The system index searches for documents that contain the words from the query.
4. Documents are ordered according to the frequency of words, their first entry point and based on the Pearson correlation coefficient, citation by other elements of the index, as well as “selectability” as a search result by the user (this is due to a backpropagation neural network).
But storing huge indexes in one DBMS is killing it in terms of speed. How can you form the possibility of horizontal scaling for storing indexes, so as not to lose much in speed?
And also, how can the pagination mechanism be implemented? You can, of course, remember the last index, but the network does not take it into account, using a full-text search in all indexes. And with this approach, you won’t save enough storage space. Create a separate cluster group with index storage and running multi-threaded servers for searching, and after merging the results? But it is ranked in the search process.
In general, I would like to hear the advice of professionals.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

S

Sergey Grigorov, 2018-07-22
@Serjio-Grig

The solution to the problem was found - replication with daily updating of the index.

D

Dimonchik, 2018-07-21
@dimonchik2013

Sphinxsearch + MVA
with pagination no miracles - only post-processing after extraction from the engine