Quick LIKE over 1 million rows, how to be?

SwoDs2016-07-26 15:59:09

PHP

SwoDs, 2016-07-26 15:59:09

Good afternoon, in the project you need to use FIAS data, when the user starts typing the name of the street, display what matches the input, i.e. if he typed "Lenin" he was given a choice of Leninsky Prospekt, st. Lenin
Full-text is not suitable, since the word must then be written in full. How to be, what to use?

Answer the question

In order to leave comments, you need to log in

10 answer(s)

Alexander Aksentiev, 2016-07-26
@Sanasol

sphinx to use.
or https://dadata.ru/

xmoonlight, 2016-07-26
@xmoonlight

First, we write all the words in the form of a hash so that the letters go in order, but the repeated ones do not repeat. 'mom washed the frame ' => 'ma ylru' You
can additionally create another cache and sort in descending order by the number of repetitions of letters: m-4][s-1][l-1][a-4][(space)-2][p-1][y-1]=>'ma ylru' (the previous example will remain unchanged.. .) and search for hash halves (for an odd number, round up) of the entered string 'ma ylru': 1. If no matches are found, the order is: 'ma s'=>'ma'=>'m ' 2. When a match is found, the order is: 'ma ylr'=>'ma yl' As the issue will be zero - we take the previous MINIMUM! issuance result.
Thus, it is possible to catch more likely the missing letters when entering.
You can create a separate table for all the words and link them to the main data.
Then the whatnot selection:
1. Transform the input string in the same way and select LIKE 'my soap%'
(several selections are possible with checking for the missing letter) remembering the result of the selection.
2. Based on this result, we are looking for the full string with the same LIKE 'mom washed the frame%'
3. On the next search, if the hash has not decreased and the characters in the length range of the previous hash have not changed, we search IMMEDIATELY! according to the result of item 1 (and again remember the result), saving time (i.e., the search, as it were, goes through the previous cache).
Thus it turns out that the more letters, the less records we sort through.
And the less we sort through, the more time we have left and we can use it for additional queries: for fuzzy search.

Optimus, 2016-07-26
Pyan @marrk2

www.mysql.ru/docs/man/Fulltext_Search.html
* The asterisk is a truncation operator. Unlike other operators, it must be added at the end of a word, not at the beginning.
apple*
... ``apple'', ``apples'', ``applesauce'', and ``applet''.

Rsa97, 2016-07-26
@Rsa97

If the name is not stored as 'st. Lenin', but as two fields - `name` = 'Lenin', `type` = 'str', then LIKE 'Len%' will use the index by `name`.

Philipp, 2016-07-26
@zoonman

sphinxsearch.com/docs/current.html#conf-expand-keywords

Egor Kazantsev, 2016-07-26
@saintbyte

Has ElasticSearch already been offered?

lxfr, 2016-07-26
@lxfr

NoSQL?

Puma Thailand, 2016-07-26
@opium

sphinx

Cage, 2016-08-04
@Cage

in general, everything has already been written from, I summarize regarding the search by muscle
1. Indexes on parengiud and formalname, and the index on the name can be limited to 6-7 characters, search for `parengiud` = '...' AND `formalname` LIKE "%lenin", put parentguid first in the search condition - it will be faster.
2. Full-text search if the engine allows.
Option 1 is probably the preferred one.

al_gon, 2016-10-29
@al_gon

solr