A
A
Alexey Ukolov2017-07-10 17:23:50
Sphinx
Alexey Ukolov, 2017-07-10 17:23:50

Why are there strange characters in the index at the beginning of words?

I'm trying to force sphinx to search correctly with and without a hyphen, it doesn't work - with different settings, the result is different, but always wrong :)
When executing the command, indextool --dumpdict services | grep верхI see the following result:
d4e7c1af54a749c7bad5335b06a723b5.png
As you can see, the word "top" is repeated twice, first with some kind of symbol at the beginning and then fine.
When executed, indextool --dumpdict services | grep исетскаяI get:
727e06c1f8254f388fc481c8a50f20a5.png
I'm trying to search for the phrase "top-Isetskaya", in the database this service is called "UK Verkh-Isetskaya (CJSC) (commercial services)", my task is to make it search equally well with and without a hyphen. At the same time, there are several more services in the database, the names of which contain "top-Isetsky" and "top-Isetsky", for some reason they are ranked higher than the one you are looking for.
Can these stubs at the beginning of keywords be the cause of my problems and how to deal with such behavior?
Here is my config, just in case:

index services
{
    source = services
    path = /var/lib/sphinxsearch/data/services
    docinfo = extern
    morphology = stem_enru
    min_stemming_len = 1
    min_word_len = 1
    min_infix_len = 1
    html_strip = 1
    index_exact_words = 1
    expand_keywords = 1
    mlock = 0
    charset_table = 0..9, A..Z->a..z, a..z, U+2C->U+2E, U+2E, U+0044, U+0046, U+0130, U+0401->U+0435, U+0451->U+0435, U+410..U+42F->U+430..U+44F, U+430..U+44F
}

Answer the question

In order to leave comments, you need to log in

1 answer(s)
P
Puma Thailand, 2017-07-10
@opium

if you want the original to prevail over everything else in the sphinx, there is an option for this, you can look in my article about the sphinx on Habré

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question