How to give search results faster than GOOGLE?

B

BonBon Slick2019-01-08 12:38:07

Search engines

BonBon Slick, 2019-01-08 12:38:07

Results: approximately 104,000,000 (0.37 sec.)
Even with a local cache, this is very difficult to achieve, but here an http request, so many results, Windows 10 searches for a file in the system hundreds of times slower.
It is also worth considering fuzzy matching and the issuance of recommendations in addition to those already found.
How is this speed achieved?
How can you do it even faster?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

S

stratosmi, 2019-01-16
@BonBonSlick

This is quite easy to implement:
Read, for example:
Oleg Bartunov, Alexander Korotkov
GIN Improvements
Full-text search in PostgreSQL in milliseconds
PGConf.EU-2012, Prague
https://wiki.postgresql.org/images/2/25/Full -text_...
Google has a slow achievement. And in the adequacy of the result - "relevance" is called.
Smart neural networks and all.
And only the most banal FTS is suitable for speed.
The algorithm there is primitive, you can even implement it on weekends to warm up.
Or just use the ready-made very fast solution
sphinxsearch.com
It's slow on your local machine because that's not its primary function.
If developers thought that search was a function of paramount importance, they would simply allocate more resources for indexing, index storage and more RAM for caching, etc.
But this would require taking resources away from the more important functions of the computer.
FTS algorithm:
Preparation:
1) Divide the text into words
2) Discard auxiliary words (prepositions, etc.). We get the so-called. tokens.
3) We run the received words-tokens through the stemming algorithm snowball.tartarus.org/algorithms/russian/stemmer.html
4) The received words without endings (called terms) are stuffed intoroaringbitmap.org It
will look something like this:
Source objects for search
a) "Hi, bear"
b) "Power is in bears"
a) -> "Hello", "bear" -> "hello", "bear"
b) -> "bears", "strength" -> "bear", "strength"
In the index like this:
"hello" 10
"bear" 11
"strength" 01
Search for the word "bear":
1) Refer to the index, we get 11, which means that both in the first and in the second phrase there is a word of interest to us.
2) Sort the result by relevance
nlpx.net/archives/57 or https://ru.wikipedia.
2) We search by the second word, we get 11
3) A broken operation for the intersection of results 10
4 ) Sort by
relevance On a local computer, this is simply not the main task. Making local searches fast is not a problem.

I

Ivan Shumov, 2019-01-08
@inoise

They will tell you everything, right?) There is optimization at every step, starting from CDN and program code, ending with data indexing and parallel queries to databases.

L

Lander, 2019-01-08
@usdglander

It is in this very "about" that the focus lies. Constant updating of the number of pages for each request + formation of the first few pages of output in the form of static html - and now the answer to your question is not collected from the storage and what is already generated is given.