S
S
see6132014-06-02 17:57:35
elasticsearch
see613, 2014-06-02 17:57:35

How to configure elasticsearch for Russian morphology?

Good afternoon.
I'm trying to set up elasticsearch for Russian morphology, but so far without success.
I use elastica for php and the elasticsearch-analysis-morphology plugin (if it matters, I do it on localhost on Windows).
The search works, but it doesn't always find words with modified endings.
Can it handle words with wildcards+morphology? For example, when we enter the phrase "*save*" and get "saved".
Also, when indexing, I use html_strip, but when searching, it still gives out with tags.
My index:

public $indexSettings = array(
        'analysis' => array(
            'analyzer' => array(
                self::INDEX_ANALYZER => array(
                    'type' => 'custom',
                    'tokenizer' => 'standard',
                    'filter' => array('lowercase', 'russian_morphology', 'english_morphology'),
                    'char_filter' => array('html_strip')
                ),
                self::SEARCH_ANALYZER => array(
                    'type' => 'custom',
                    'tokenizer' => 'standard',
                    'filter' => array('lowercase', 'russian_morphology', 'english_morphology')
                ),
                self::SEARCH_STRICT_ANALYZER => array(
                    'type' => 'custom',
                    'tokenizer' => 'standard',
                    'filter' => array('lowercase')
                )
            )
        )
    );
    public $mappingSettings = array(      
        'title'=>array(
            'type'=>'string',
            'include_in_all'=>true,
            'analyzer'=>self::INDEX_ANALYZER,
            'boost'=>100
        ),
        'content'=>array(
            'type'=>'string',
            'include_in_all'=>true,
            'analyzer'=>self::INDEX_ANALYZER,
            'boost'=>1
        ),
        'url'=>array(
            'type'=>'string',
            "index"=>"not_analyzed",
            'include_in_all'=>false,
        ),
    );

My search query:
Elastica\Query Object
(
    [_params:protected] => Array
        (
            [query] => Array
                (
                    [query_string] => Array
                        (
                            [query] => ваш
                            [analyzer] => searchAnalyzer
                            [fields] => Array
                                (
                                    [0] => title^100
                                    [1] => content^1
                                )
                        )
                )
            [highlight] => Array
                (
                    [fields] => Array
                        (
                            [content] => Array
                                (
                                    [fragment_size] => 100
                                    [number_of_fragments] => 5
                                )
                        )
                    [pre_tags] => Array
                        (
                            [0] => *
                        )
                    [post_tags] => Array
                        (
                            [0] => *
                        )
                )
        )
    [_suggest:protected] => 0
    [_rawParams:protected] => Array
        (
        )
)

If it helps, my classes are:
pastebin.com/wXzjSZ6q
pastebin.com/YGxXSkj0
pastebin.com/ptiTVt0Q

Answer the question

In order to leave comments, you need to log in

1 answer(s)
P
Pavel Solovyov, 2014-06-02
@pavel_salauyou

add snowball filter and to separate words from tags try worddelimiter filter

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question