A
A
andrei20192018-09-08 12:47:35
Sphinx
andrei2019, 2018-09-08 12:47:35

How to set up stemming in Sphinx?

I am using Sphinx 3.0.3. I created a simple test file: The
Белая корова. Белыми коровами. Белой коровой.
question is: Why does Sphinx find these phrases for the word "cow" (not in the text), but not for the words "cow" or "cow"? It turns out that not stemming (circumcision to cows) is used, but a dictionary in which there are no cows and cows?
The config so far is:

morphology = stem_enru
min_word_len = 2
index_exact_words = 1
expand_keywords = 1
min_infix_len = 3
min_prefix_len = 3
#enable_star = 1 #removed
#min_word_len = 1  #removed
#dict = keywords #removed

Is it possible to configure the config so that these phrases are displayed according to the words "cow" or "cow"?
I don’t specifically need cows now, I just want to understand how to set up or know the restrictions.
5b94d88d2f606770244523.jpeg

Answer the question

In order to leave comments, you need to log in

2 answer(s)
P
Puma Thailand, 2018-09-08
@opium

Stemming cuts off the endings, you need to go the other way, find an article about the sphinx on Habré in my profile, it tells the option for your case just

X
xmoonlight, 2018-09-08
@xmoonlight

Here is my finished algorithm (in PHP) for fuzzy searching for strings of words with arbitrary beginning and end of words, including automatic correction of similar character styles to the desired language.
PS: By the way, here's something similar to Sphinx: stumper.ru (probably recently done)
Option 2: Cut out all suffixes from the search query using regex and the problem is solved:

-щик, -льщик
-анин, -янин
-ница, -тель
-льник
-ница
-ость, -есть
-ота, -ета
-ецо, -ице
-изна
-ство
-отня, -овня
-ство, -ество
-ина, -инка
-ёнок, -онок
-очка, -ечка, -ичка
-енька, -онька
-ушка, -юшка
-ышко
-ишко, -ишка
-ёнка, -онка
-инка, -енка
-ище, -ища
-ушк-, -юшк-, -ышк-
-ёнка, -онка, -ёнок, -онок, -юнок, -унок
-енька, -онька, -анька

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question