E
E
edo1h2021-01-06 08:22:25
Sphinx
edo1h, 2021-01-06 08:22:25

How to properly support Arabic in manticore?

I'm trying to transfer the sphinx config to manticore.
The lines are in different languages, I want the most universal config.

What was in the Sphinx:

# chinese lang setting
ngram_len = 1
ngram_chars = U+3000..U+2FA1F
        
# ignore arabic chars
ignore_chars = U+0640, U+064B..U+065F,U+06D6..U+06DC,U+06DF..U+06E8,U+06EA..U+06ED

charset_table = /много-много кодов из разных языков/


Sphinxsearch.com/wiki/doku.php?id=charset_tables#arabic says :
Its necessary to add ignore_chars to ignore vowels, black space and other Arabic signs


In the manticore documentation, the recommended config is more compact:
ngram_len = 1
ngram_chars = cjk
charset_table = non_cjk


Question: Will the absence of ignore_chars break Arabic searches?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
ManticoreSearch, 2021-01-06
@edo1h

May break. If the quality of the search in Arabic texts is extremely important, then it is better to leave it as in the sphinx.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question