How to autocomplete addresses in Elasticsearch?

O

ordinary_pavel2017-07-21 14:14:57

MySQL

ordinary_pavel, 2017-07-21 14:14:57

Good day! There was a need to write an autocomplete of addresses. To do this, we decided to take addresses from the FIAS database and index them in elastic as an address string like "Magadan Region, Magadan, Proletarskaya St., 117".
I do a full text search on this line.
Custom analyzer. The settings are as follows:
https://gist.github.com/anonymous/dc84e31ff7f40ea3...
For queries I use match_phrase_prefix:
'match_phrase_prefix': {
"plaintext": {
"query": "lenin 7",
"analyzer": "address"
}
}
Looks for more or less sane. However, there are two problems.
1) how would I not take into account the order of the words that are searched in the phrase? Are there options other than slop?
2) When searching by prefixes, when you make a query like: "Lenina 1", the results are "Lenina d. 1", "Lenina d. 12", "Lenina d. 113", etc. have the same score in the output (I assume that the elastic analyzes the string byte by byte when searching for a prefix and, as soon as it finds a match for a certain token, includes the result in the output, regardless of whether it follows the match). Accordingly, an unpleasant situation arises when, at the request of "Lenin 1", it is impossible to get the first house (with a limit on the number of results 10), because the issue is clogged up at houses like "Lenin d. 112", "Lenin d. 113").
How would I give a bonus for an exact match?
I also tried the suggest field, but it didn't search very well, since you can't make a normal query on the input.
Well, in general, the idea of arranging autocomplete with full-text search is not the best. Does anyone have any ideas how to properly autocomplete addresses?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

C

chupasaurus, 2017-07-21
@chupasaurus

Incite the analyzer below by entering it into the index and to the required fields:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "address_analyzer_toster": {
          "tokenizer": "whitespace",
          "char_filter": [
            "useless_symbols",
          ],
          "filter": "useless_words": {
                    "type":       "stop",
                    "stopwords": ["ул", "д", "проезд", ...]
          }
        }
      },
      "char_filter": {
        "useless_symbols": {
          "type": "pattern_replace",
          "pattern": "[,\.:].*",
          "replacement": ""
        }
      }
    }
  }
}