Answer the question
In order to leave comments, you need to log in
How to fix bugs in wordform.txt?
Good day!
I see completely strange errors in the searchd.log
:
1. duplicate wordform found - overridden
2. all source tokens are stopwords
3. no destination token found
An example of such errors:
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': no destination token found (wordform='gee > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': all source tokens are stopwords (wordform='je > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': all source tokens are stopwords (wordform='же > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': all source tokens are stopwords (wordform='жи > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': all source tokens are stopwords (wordform='жэ > GE', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': no destination token found (wordform='джепи > GP', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': no destination token found (wordform='джипи > GP', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:22:12.418 2018] [27077] WARNING: index 'bitrix': no destination token found (wordform='джыпи > GP', file='/etc/sphinxsearch/wordforms/wordforms.txt'). IGNORED.
[Sat Oct 27 23:24:34.833 2018] [27077] WARNING: index 'bitrix': duplicate wordform found - overridden ( current='defendere > Defender Pilot', old='defender pilot > defender pilot pilot' ). Fix your wordforms file '/etc/sphinxsearch/wordforms/wordforms.txt'.
[Sat Oct 27 23:24:34.833 2018] [27077] WARNING: index 'bitrix': duplicate wordform found - overridden ( current='defendr > Defender Pilot', old='defender pilot > defender pilot pilot' ). Fix your wordforms file '/etc/sphinxsearch/wordforms/wordforms.txt'.
index bitrix
{
#main settings
source= bitrix
type = rt
path = /var/lib/sphinxsearch/data/bitrix
#docinfo = inline
wordforms = /etc/sphinxsearch/wordforms/wordforms.txt
#exceptions = /etc/sphinxsearch/exceptions/exceptions.txt
#choose appropriate type of morphology to use
#morphology = lemmatize_ru_all, lemmatize_en_all, lemmatize_de_all, stem_enru
#morphology = lemmatize_ru_all, lemmatize_en_all
morphology = stem_enru, soundex
#these settings are used by bitrix:search.title component
prefix_fields = title
infix_fields=
#min_prefix_len = 2
rt_mem_limit = 512M
ondisk_attrs = 1
#min_prefix_len = 3
min_word_len = 3
#min_infix_len = 1
min_stemming_len =3
expand_keywords = 1
index_exact_words = 1
#enable_star = 1
#all fields must be defined exactly as followed
rt_field = title
rt_field = body
rt_attr_uint = module_id
rt_attr_string = module
rt_attr_uint = item_id
rt_attr_string = item
rt_attr_uint = param1_id
rt_attr_string = param1
rt_attr_uint = param2_id
rt_attr_string = param2
rt_attr_timestamp = date_change
rt_attr_timestamp = date_to
rt_attr_timestamp = date_from
rt_attr_uint = custom_rank
rt_attr_multi = tags
rt_attr_multi = right
rt_attr_multi = site
rt_attr_multi = param
#depends on settings of your site
# uncomment for single byte character set
#charset_type = sbcs
# uncomment for UTF character set
# charset_type = utf-8
charset_table = 0..9, A..Z->a..z, x->U+0445, c->U+0441, _, a..z, \
U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+0435, U+451->U+0435
blend_chars = U+002C, U+2010, U+2012, U+2013, U+2014, U+2044, U+002F, U+002D, U+2d, /
}
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question