A
A
Anton Ivanov2016-08-09 11:11:06
elasticsearch
Anton Ivanov, 2016-08-09 11:11:06

How to reduce the impact of russian_morphology on index speed in elasticsearch 2.4?

Hello.
I use this plugin: https://github.com/imotov/elasticsearch-analysis-m...
And here is the index setup:

index:
  number_of_shards: 5

  analysis:
    char_filter:
      ru:
        type: mapping
        mappings: ['Ё=>Е', 'ё=>е']
    analyzer:
      default_index:
        alias: [index_ru]
        type: custom
        tokenizer: nGram
        filter: [stopwords_ru, stop, custom_word_delimiter, lowercase, russian_morphology, english_morphology]
        char_filter: [ru]
      default_search:
        alias: [search_ru]
        type: custom
        tokenizer: standard
        filter: [stopwords_ru, stop, custom_word_delimiter, lowercase, russian_morphology, english_morphology]
        char_filter: [ru]
    tokenizer:
      nGram:
        type: nGram
        min_gram: 4
        max_gram: 20
    filter:
      stopwords_ru:
        type: stop
        stopwords: [а,без,более,бы,был,была,были,было,быть,в,вам,вас,весь,во,вот,все,всего,всех,вы,где,да,даже,для,до,его,ее,если,есть,еще,же,за,здесь,и,из,или,им,их,к,как,ко,когда,кто,ли,либо,мне,может,мы,на,надо,наш,не,него,нее,нет,ни
        ignore_case: true
      custom_word_delimiter:
        type: word_delimiter
        # "PowerShot" ⇒ "Power" "Shot", части одного слова становятся отдельными токенами
        generate_word_parts: true
        generate_number_parts: true  # "500-42" ⇒ "500" "42"
        catenate_words: true  # "wi-fi" ⇒ "wifi"
        catenate_numbers: false  # "500-42" ⇒ "50042"
        catenate_all: true  # "wi-fi-4000" ⇒ "wifi4000"
        split_on_case_change: true  # "PowerShot" ⇒ "Power" "Shot"
        preserve_original: true  # "500-42" ⇒ "500-42" "500" "42"
        split_on_numerics: false  # "j2se" ⇒ "j" "2" "se"

The index of new documents is rather slow, about 10 documents per second.
When disabling morphology analysis, the speed increases several times.
Can I somehow speed up the indexing of new documents, or is this a normal speed?
I'm checking on a Macbook Late 2013, Core i7 2.6, 5 GB allocated for elastic
. Thanks in advance.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question