Answer the question
In order to leave comments, you need to log in
How to normalize Russian names?
Good afternoon.
There is a text/csv file (whatever) with some user information, including "Name".
For example, "sanya", "sanya", "sasha", "aleksandr". Etc.
Those. variety of spellings of names.
The task is to bring all the names to a single form "Alexander", etc.
I rummaged all over Google / Yandex - I did not find anything on the topic.
Has anyone faced similar issues?
Perhaps someone can tell me where to get a dictionary for this case?
The implementation language is unimportant, the algorithm is interesting and, if any, the dictionary with names itself.
Thanks in advance to all who respond!
Answer the question
In order to leave comments, you need to log in
I don’t know ready-made solutions, but perhaps this method will do:
1. Parse the list of names from Wikipedia: en.wikipedia.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%...
2 Parsing pages about the name, say: en.wikipedia.org/wiki/%D0%90%D0%BB%D0%B5%D0%BA%D1%... We extract derived forms.
3. We put all this into a simple database or XML\Json file.
4. We try, edit the base, add exotic options. Those names that are completely absent in the database (including typos) are left for manual editing.
You can parse not a wiki, but download its copy from torrents for this purpose. If you still parse online, then use the mobile version.
You can add a dictionary of names on Gramota.ru to the @Gorily option , but there is a big pitfall: some abbreviated names are suitable for several full names.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question