G
G
gibigate2015-11-18 14:57:20
PHP
gibigate, 2015-11-18 14:57:20

How to get morphological variants of a word in the sphinx?

I will give an example to make the task clearer.
There are a lot of lines where different variants of words are used:
text run,
text runner,
text running,
text runs
text2 run
text2 runner
text3 running
When querying for the word run, all the above results come out
Question:
How to select one from text run*, one from text2 run * and text3 run. Accordingly, one of textN + keyword
OR:
how to get all forms of the word run (that is, the result should be of the type runner, running, etc.)
Morphology is included.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Andrew, 2015-11-18
@R0dger

Look here - chakrygin.ru/2013/07/sphinx-search.html

P
Puma Thailand, 2015-11-18
@opium

1) they are immediately sorted by weight, choose the first one from the output, you can set a limit of 1 in the query there
2) no way with a sphinx, use some kind of dictionary

K
klirichek, 2016-02-25
@klirichek

There is no way to get derivative forms of a word; aot initially works in the opposite direction (gets the original lemmas FROM the derivatives).
Here you can get lemmas, but a little dreary.
To do this, add an index with the necessary lemmatizer in the config (it is not necessary to index it, i.e. you can say that it is rt and then the daemon will take off with it and without index files).
type:

index fake_rt
{
  ...
  rt_field = fake
  morphology = lemmatize_ru_all, lemmatize_en_all, lemmatize_de_all
}

Then we turn on profiling and make a request to this index with the desired word. After we look at the query plan.
mysql> set profiling=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from rt_full where match ('стали');
Empty set (0.00 sec)

mysql> show plan;
+------------------+--------------------------------------------------------------------------------------------------------+
| Variable         | Value                                                                                                  |
+------------------+--------------------------------------------------------------------------------------------------------+
| transformed_tree | OR(
  AND(KEYWORD(сталь, querypos=1, morphed)), 
  AND(KEYWORD(стать, querypos=1, morphed)))           |
+------------------+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

From the extreme value of the plan, you can bite out the desired forms with external regexps

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question