A
A
Andrew2018-01-29 18:46:30
PostgreSQL
Andrew, 2018-01-29 18:46:30

How to implement a search by last name in the database?

Let's say there is a table with two fields: idand data. The field datastores arbitrary text in Russian, which can contain names and surnames. You need to implement a search on this table:
Search requirements:

  • Independence from case ("Ivanova" is the same as "Ivanov", "Ivanov", "Ivanovich")
  • Tolerance for typos ("simenov" is the same as "semyonov")
  • Relevance Ranking

Tried PostgreSQL full text search with package pg_tgrm, doesn't work as it should:
SELECT to_tsvector('russian', 'Анна Иванова') @@ to_tsquery('russian', 'иванов') -- false
SELECT to_tsvector('russian', 'Иван Иванов') @@ to_tsquery('russian', 'иванова') -- false

SELECT similarity('иванов', 'иванова') -- 0.66
SELECT similarity('иванов', 'ивановым') -- 0.6
SELECT similarity('иваныч', 'иванович') -- 0.33

Similarly, ElasticSearch doesn't find anything either:
client.CreateIndex(
  index,
  m => m.Mappings(mp =>
    mp.Map<Page>(mx =>
      mx.Properties(p =>
        p.Text(x =>
          x.Name(f => f.Title)
          .Analyzer("index_ru")
          .SearchAnalyzer("search_ru")
        )
      )
    )
  )
  .Settings(s =>
    s.Analysis(a => 
      a.CharFilters(c =>
        c.Mapping("filter_ru_e", z => z.Mappings("Ё => Е", "ё => е"))
      )
      .Tokenizers(t =>
        t.NGram("n_gram", ng =>
          ng.MinGram(4).MaxGram(20)
        )
      )
      .Analyzers(an => 
        an.Custom("index_ru", ac =>
          ac.CharFilters("html_strip", "filter_ru_e")
          .Tokenizer("n_gram")
          .Filters("stop", "lowercase", "russian_morphology", "english_morphology")
        )
        .Custom("search_ru", ac =>
          ac.CharFilters("html_strip", "filter_ru_e")
          .Tokenizer("standard")
          .Filters("stop", "lowercase", "russian_morphology", "english_morphology")
        )
      )
    )
  )
);

var docs = new []
{
  new Page("Иван Иванов"),
  new Page("Петр Иванов"),
  new Page("Илья Иванов"),
  new Page("Светлана Иванова"),
  new Page("Анна Иванова"),
};

foreach(var doc in docs)
  client.Index(doc);
  
var query = client.Search<Page>(
  s => s.Query(
    q => q.Match(
      f => f.Field(x => x.Title)
          .Query("иванов")
    )
  )
);

Isn't there a ready-made tool that works as it should?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
#
#, 2018-01-29
@mindtester

and what do you think? Why hasn't Cortana been released in Russian yet?
ps
this is not about the uniqueness of the Russian language, it's just that the topic has not yet become trivial, at this stage of IT development,
look at the services and developments of the service https://dadata.ru/ suddenly something will come in handy

A
Alexey Cheremisin, 2018-01-30
@leahch

It's strange, I have a russian analyzer for Russian fields, out of the box, it searches for word forms quite well, but I haven't tested it on surnames. Also try fuzzy query.
Right now I checked: " swan sensor ", finds "Lamp Swans night light with light sensor Cosmos".
I am looking for sites , I get "A gray platform and a round black handle with a lock"
I am looking for a lamp , finds both a lamp and a lamp and lamps , and a lamp .
I am looking for a nail , I get a plastic staple with a nail
PS. Elasticsearch 5.1.1 if cho. I don’t specifically install any plugins with Russian morphology from version 2.x

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question