Answer the question
In order to leave comments, you need to log in
What is the best way to set up Elastic Search for the Russian language under a Rails project?
I'm trying to set up ElasticSearch to search in a Russian-language dictionary of terms. But for some reason he does it in a strange way. Let's say there are a number of terms: fire, fire safety, fire hazard of substances, fire protection, fire detector, fire hydrant, fire hazardous area. So, when searching for the word fire , it returns only part of the list as a result: a fire detector, a fire hydrant, a fire hazard zone.
I'm figuring this out on my own and I'm at a dead end.
ElasticSearch version 2.1.0
Russian morphology plugin is also installed: https://github.com/imotov/elasticsearch-analysis-m...
And https://github.com/royrusso/elasticsearch-HQ
AND gems are used
' elasticsearch-model'
gem 'elasticsearch-rails'
Here are my files:
models/dictionary.rb
class Dictionary < ActiveRecord::Base
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
index_name "dictionary"
mapping do #dynamic: 'false'
[:title].each do |attribute|
indexes attribute, type: 'string'
end
end
def self.search(query)
Dictionary.__elasticsearch__.search(
{
query: {
multi_match: {
query: "#{query}&pretty",
fields: ['title']
}
},
sort: [{ title: {order:"asc"} }]
}
)
end
end
if Dictionary.__elasticsearch__.client.indices.index_exists?
Dictionary.import # for auto sync model with elastic search
else
Dictionary.__elasticsearch__.client.indices.delete index: Dictionary.index_name rescue nil
# Create the new index with the new mapping
Dictionary.__elasticsearch__.client.indices.create(index: Dictionary.index_name)
end
Elasticsearch::Model.client = Elasticsearch::Client.new url: "http://localhost:9200/"
Dictionary.__elasticsearch__.client.indices.delete index: Dictionary.index_name rescue nil
index:
number_of_shards: 5
number_of_replicas: 1
analysis:
char_filter:
ru:
type: mapping
mappings: ['Ё=>Е', 'ё=>е']
analyzer:
default_index:
alias: [index_ru]
type: custom
tokenizer: nGram
filter: [stopwords_ru, stop, custom_word_delimiter, lowercase, snowball, russian_morphology, english_morphology]
char_filter: [ru]
default_search:
alias: [search_ru]
type: custom
tokenizer: standard
filter: [stopwords_ru, stop, custom_word_delimiter, lowercase, snowball, russian_morphology, english_morphology]
char_filter: [ru]
tokenizer:
nGram:
type: nGram
min_gram: 3
max_gram: 20
filter:
stopwords_ru:
type: stop
stopwords: [а,без,более,бы,был,была,были,было,быть,в,вам,вас,весь,во,вот,все,всего,всех,вы,где,да,даже,для,до,его,ее,если,есть,еще,же,за,здесь,и,из,или,им,их,к,как,ко,когда,кто,ли,либо,мне,может,мы,на,надо,наш,не,него,нее,нет,ни,них,но,ну,о,об,однако,он,она,они,оно,от,очень,по,под,при,с,со,так,также,такой,там,те,тем,то,того,тоже,той,только,том,ты,у,уже,хотя,чего,чей,чем,что,чтобы,чье,чья,эта,эти,это,я]
ignore_case: true
custom_word_delimiter:
type: word_delimiter
# "PowerShot" ⇒ "Power" "Shot", части одного слова становятся отдельными токенами
generate_word_parts: true
generate_number_parts: true # "500-42" ⇒ "500" "42"
catenate_words: true # "wi-fi" ⇒ "wifi"
catenate_numbers: false # "500-42" ⇒ "50042"
catenate_all: true # "wi-fi-4000" ⇒ "wifi4000"
split_on_case_change: true # "PowerShot" ⇒ "Power" "Shot"
preserve_original: true # "500-42" ⇒ "500-42" "500" "42"
split_on_numerics: false # "j2se" ⇒ "j" "2" "se"
snowball:
type: snowball
language: Russian
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question