N
N
NamnaR2018-10-28 09:51:31
Sphinx
NamnaR, 2018-10-28 09:51:31

How to fix bugs in wordform.txt?

Good day!
I see completely strange errors in the searchd.log
: 1. duplicate wordform found - overridden
2. all source tokens are stopwords
3. no destination token found
An example of such errors:

[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': no destination token found (wordform='gee > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': all source tokens are stopwords (wordform='je > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': all source tokens are stopwords (wordform='же > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': all source tokens are stopwords (wordform='жи > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': all source tokens are stopwords (wordform='жэ > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.

[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': no destination token found (wordform='джепи > GP', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': no destination token found (wordform='джипи > GP', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': no destination token found (wordform='джыпи > GP', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.


[Sat Oct 27 23:24:34.833 2018] [27077] WARNING: index 'bitrix': duplicate wordform found - overridden ( current='defendere > Defender Pilot', old='defender pilot > defender pilot pilot' ). Fix your wordforms file '/etc/sphinxsearch/wordforms/wordforms.txt'.
[Sat Oct 27 23:24:34.833 2018] [27077] WARNING: index 'bitrix': duplicate wordform found - overridden ( current='defendr > Defender Pilot', old='defender pilot > defender pilot pilot' ). Fix your wordforms file '/etc/sphinxsearch/wordforms/wordforms.txt'.

sphinx.conf config
index bitrix
{
    #main settings
        source= bitrix
        type = rt
        path = /var/lib/sphinxsearch/data/bitrix
        #docinfo = inline 
  wordforms = /etc/sphinxsearch/wordforms/wordforms.txt
        #exceptions = /etc/sphinxsearch/exceptions/exceptions.txt
    #choose appropriate type of morphology to use
        #morphology = lemmatize_ru_all, lemmatize_en_all, lemmatize_de_all, stem_enru
        #morphology = lemmatize_ru_all, lemmatize_en_all
        morphology = stem_enru, soundex
    #these settings are used by bitrix:search.title component
        prefix_fields = title
        infix_fields=
        #min_prefix_len = 2

        rt_mem_limit = 512M
        ondisk_attrs = 1
       
        #min_prefix_len = 3
        min_word_len = 3
        #min_infix_len = 1
        min_stemming_len =3 

        expand_keywords = 1
        index_exact_words = 1
      
         
        #enable_star = 1
    #all fields must be defined exactly as followed
        rt_field = title
        rt_field = body
        rt_attr_uint = module_id
        rt_attr_string = module
        rt_attr_uint = item_id
        rt_attr_string = item
        rt_attr_uint = param1_id
        rt_attr_string = param1
        rt_attr_uint = param2_id
        rt_attr_string = param2
        rt_attr_timestamp = date_change
        rt_attr_timestamp = date_to
        rt_attr_timestamp = date_from
        rt_attr_uint = custom_rank
        rt_attr_multi = tags
        rt_attr_multi = right
        rt_attr_multi = site
        rt_attr_multi = param
    #depends on settings of your site
        # uncomment for single byte character set
       #charset_type = sbcs
       # uncomment for UTF character set
       # charset_type = utf-8
       charset_table = 0..9, A..Z->a..z, x->U+0445, c->U+0441, _, a..z, \
    U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+0435, U+451->U+0435
  blend_chars = U+002C, U+2010, U+2012, U+2013, U+2014, U+2044, U+002F, U+002D, U+2d, /
}

Can you please tell me how to fix these errors in wordforms.txt?
Thanks in advance!

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question