How to implement a postal address correction system?

B

babaevmm2016-06-21 16:45:49

Algorithms

babaevmm, 2016-06-21 16:45:49

Hello!
Tell me where to dig in the issue of reduction to the canonical form and search for the index for the postal address. At work, 4 years ago, I wrote a system for working with KLADR. The bottom line: there are addresses written down at random - not only with spelling errors, but with a distorted address (let's say the area is incorrectly indicated). It was necessary to find the corresponding index. Everything was implemented through system training - the operator, in case of a mismatch, put down matches and the next time the system could determine the corresponding address node. The system is outdated, there is no one to teach. It was decided to do something universal like dadata services, etc., but its own and so that it works locally.
I read articles about fuzzy search algorithms. But a question has arisen for a person familiar with the described subject area: in which direction to dig? Which algorithms should be considered first? you can narrow the circle by subject area.
Thanks in advance!

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

Dimonchik, 2016-06-21
@dimonchik2013

https://habrahabr.ru/post/232347/
https://habrahabr.ru/company/hflabs/blog/260601/
https://habrahabr.ru/company/hflabs/blog/254757/

A

Adamos, 2016-06-22
@Adamos

The address, most likely, still indicates the region (first of all, you need to look for it) or at least the city (secondarily). When they are determined, the number of street options will no longer be so scary.
The keywords "street", "district", etc. can help distinguish street. Big Moscow from the region.
We calculate the Levenshtein distances for the available correct options and fragments of the input string. The least is correct...