Answer the question
In order to leave comments, you need to log in
How similar are the strings (Ivan Petrov and Ivan Petrov)?
Good afternoon!
There is a problem:
In view of the transfer from one infrastructure to another, it is necessary to link the full name in Russian and the full name in transliteration.
Available data:
Full correct full name in Russian (Ivanov Petr Fedorovich) and transliterated full name\name\name (Ivanov Petr\Ivanov Petr F.\Ivamov Petr Fedorovich)
Required A
function that accepts a full name transliterated from the full Russian full name by the robot (according to the rules) and a list of pairs [full name; id], where full name is the full name transliterated by the person, almost according to the rules. Moreover, the name list may not contain a patronymic or it may be abbreviated.
Required:
Return a list of the format ФИОтрансл | id | степень похожести
That is, for example,
F(toTranslit('Ivanova Yulia Mymrova'),$ListTranslit)
will give
Ivanova Yulia Mymrova | BBB1123 | 130
Ivanova Julia M. | AAA5543 | 100
Ivanova Ylia | CCC2234 | 95
Tell me if there is a similar solution, or at least an algorithm for comparing the similarity of strings, which gives an estimate of the similarity in some arbitrary units
Answer the question
In order to leave comments, you need to log in
Quite applicable.
You can try to build all possible transliterations for each full name and then calculate the Levinshtein distance for all and find the most similar full name.
Here you can see the transliteration scheme
akmac.narod.ru/st/st9.htm
Sphynx can do it out of the box, PostgreSQL can too. Levinshtein's method for finding errors, he created problems for names like Chon ("Chon" / "Chyohn")
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question