R
R
rkhokhorin2020-07-29 18:18:04
Python
rkhokhorin, 2020-07-29 18:18:04

How to compare names by their part (or derivative) in python?

I have 2 names and I need to compare them. The problem is that I don't know how these 2 names are spelled. There may be Dmitry and Dmitry, maybe Dimochka and Dimon, maybe one of the names will be misspelled. I need to compare them and make sure they are the same names (or not). To do this, I came up with the following algorithm:

  1. We translate names in transliteration (if they contain non-Cyrillic characters)
  2. We do stemming by name
  3. We go through the resulting names with an algorithm for finding the Levenshtein distance (we set the number of errors to 2)

But there is a problem with this algorithm. Suppose Dima and Dmitry flew to me. Transliteration does not need to be translated here. After stemming, we will get dim and dmitr... and, as you understand, the Levenshtein distance will say that these are different names. From this question: how to understand that Dima \u003d Dmitry or Dim \u003d Dmitry? Maybe there are python libraries for such a search? Or is there some kind of algorithm? Or is my algorithm completely wrong?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
G
grecha10, 2020-07-29
@rkhokhorin

If this is not an educational, but a practical task, then you unreasonably complicate the implementation. It's easier and more convenient to use a table of names. Not to mention that this is the only way to compare the same, but completely different sounding names, for example, Georgy and Zhora, Anna and Nyura.
To create a table of names, you can use an ordinary list. For example:

names = [
       ('Саша', 'Александр'), 
       ('Георгий', 'Жора'), 
       ('Лена', 'Леночка', 'Lena')
]

name1 = 'Жора'
name2 = 'Георгий'
for x in names:
    if name1 in x and name2 in x:
        print(name1, name2, 'same')

A
Andrey, 2020-07-29
@anerev

The best option is to find some library for comparisons. The first thing that the search found was https://antonz.ru/difflib-ratio/

X
xmoonlight, 2020-07-29
@xmoonlight

From this question: how to understand that Dima \u003d Dmitry or Dim \u003d Dmitry?

Use only if the distance is large!
First, we equalize the "weights" of some letters: e=e, p=w=n.
Checks:
1. Swapping the first adjacent letters in places: 123 -> 132 ( Dim a-> Dmitri ) or 123 -> 213 (...)
2. Shift with adding a new letter in front: 12 -> 012 ( Le na-> Barely , Kat tya- > Eka terina, Le sha- > Aleksey)
3. Match the first two letters with the pattern inside the word and then check: either two consonants in a row, or a match of 3 letters in a row: Sa sha -> Aleksand r, Kesha -> Ino kent iy.
This is without a dictionary, but with a dictionary - it will be more reliable!

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question