Answer the question
In order to leave comments, you need to log in
Quick LIKE over 1 million rows, how to be?
Good afternoon, in the project you need to use FIAS data, when the user starts typing the name of the street, display what matches the input, i.e. if he typed "Lenin" he was given a choice of Leninsky Prospekt, st. Lenin
Full-text is not suitable, since the word must then be written in full. How to be, what to use?
Answer the question
In order to leave comments, you need to log in
First, we write all the words in the form of a hash so that the letters go in order, but the repeated ones do not repeat.
'mom washed the frame ' => 'ma ylru'
You
can additionally create another cache and sort in descending order by the number of repetitions of letters: m-4][s-1][l-1][a-4][(space)-2][p-1][y-1]=>'ma ylru' (the previous example will remain unchanged.. .)
and search for hash halves (for an odd number, round up) of the entered string 'ma ylru':
1. If no matches are found, the order is: 'ma s'=>'ma'=>'m '
2. When a match is found, the order is: 'ma ylr'=>'ma yl' As the issue will be zero - we take the previous MINIMUM! issuance result.
Thus, it is possible to catch more likely the missing letters when entering.
You can create a separate table for all the words and link them to the main data.
Then the whatnot selection:
1. Transform the input string in the same way and select LIKE 'my soap%'
(several selections are possible with checking for the missing letter) remembering the result of the selection.
2. Based on this result, we are looking for the full string with the same LIKE 'mom washed the frame%'
3. On the next search, if the hash has not decreased and the characters in the length range of the previous hash have not changed, we search IMMEDIATELY! according to the result of item 1 (and again remember the result), saving time (i.e., the search, as it were, goes through the previous cache).
Thus it turns out that the more letters, the less records we sort through.
And the less we sort through, the more time we have left and we can use it for additional queries: for fuzzy search.
www.mysql.ru/docs/man/Fulltext_Search.html
* The asterisk is a truncation operator. Unlike other operators, it must be added at the end of a word, not at the beginning.
apple*
... ``apple'', ``apples'', ``applesauce'', and ``applet''.
If the name is not stored as 'st. Lenin', but as two fields - `name` = 'Lenin', `type` = 'str', then LIKE 'Len%' will use the index by `name`.
in general, everything has already been written from, I summarize regarding the search by muscle
1. Indexes on parengiud and formalname, and the index on the name can be limited to 6-7 characters, search for `parengiud` = '...' AND `formalname` LIKE "%lenin", put parentguid first in the search condition - it will be faster.
2. Full-text search if the engine allows.
Option 1 is probably the preferred one.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question