B
B
becks2014-12-19 12:40:34
Sphinx
becks, 2014-12-19 12:40:34

How to competently search for license information using Sphinx?

There is an rt index. Now the morphology is configured to lemmatize 3 main languages ​​(rus, eng, germ):

morphology	 = lemmatize_ru, lemmatize_en, lemmatize_de
In the near future, processing settings for other languages ​​will be added, but by stemme (French, Italian, etc.). Keyword searches are doing just fine now. There was a problem of search of number information. What in this case can be license information:
1) 423452352364265 - just some sequential digits
2) 42-3452352(36426)5 sequence with different separators
What search functionality for license information is needed:
1) Full match: A
hypothetical query example
SELECT * FROM rt_index where match('423452352364265');
2) Starts with:
Suggested sample query SELECT * FROM rt_index where match('42345*');
3) Contains:
Suggested sample query SELECT * FROM rt_index where match('*523642*');
4) Ends with:
Suggested sample query SELECT * FROM rt_index where match('*265');
In the results of these queries, I would like to get both of the above entries.
Since I will use the index built on the lemma and stemma, then the search by asterisk (*), as I understand it, will not work. Well, even if you could use *, the index would grow incredibly.
What I see is the only solution. Do pre-processing of the text, get all the number information, clean it from separators and put it in the next field. Set up another index (stemma, search by asterisk) to search only for this field. For number search - search only in this column. Clumsy solution, but most likely will work. Again, there is some problem with matching the converted number with the number in the text.
Maybe there are other more elegant solutions?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question