How similar are the strings (Ivan Petrov and Ivan Petrov)?

F

FilimoniC2013-04-25 22:17:19

Programming

FilimoniC, 2013-04-25 22:17:19

Good afternoon!
There is a problem:
In view of the transfer from one infrastructure to another, it is necessary to link the full name in Russian and the full name in transliteration.
Available data:
Full correct full name in Russian (Ivanov Petr Fedorovich) and transliterated full name\name\name (Ivanov Petr\Ivanov Petr F.\Ivamov Petr Fedorovich)
Required A
function that accepts a full name transliterated from the full Russian full name by the robot (according to the rules) and a list of pairs [full name; id], where full name is the full name transliterated by the person, almost according to the rules. Moreover, the name list may not contain a patronymic or it may be abbreviated.
Required:
Return a list of the format
ФИОтрансл | id | степень похожести
That is, for example,
F(toTranslit('Ivanova Yulia Mymrova'),$ListTranslit)
will give
Ivanova Yulia Mymrova | BBB1123 | 130
Ivanova Julia M. | AAA5543 | 100
Ivanova Ylia | CCC2234 | 95
Tell me if there is a similar solution, or at least an algorithm for comparing the similarity of strings, which gives an estimate of the similarity in some arbitrary units

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

J

jdponomarev, 2013-04-25
@jdponomarev

Quite applicable.
You can try to build all possible transliterations for each full name and then calculate the Levinshtein distance for all and find the most similar full name.
Here you can see the transliteration scheme
akmac.narod.ru/st/st9.htm

S

ssbb, 2013-04-26
@ssbb

About determining the similarity of strings: habrahabr.ru/qa/1186/

O

Otkrick, 2013-04-26
@Otkrick

Sphynx can do it out of the box, PostgreSQL can too. Levinshtein's method for finding errors, he created problems for names like Chon ("Chon" / "Chyohn")

X

xmoonlight, 2016-02-10
@xmoonlight

How to determine the similarity of two strings?