P
P
Pavel Bykov2022-03-04 12:55:46
PHP
Pavel Bykov, 2022-03-04 12:55:46

Is it possible to somehow determine the surname or name by certain rules?

Hello everyone, the client has a database with a full name, but the trouble is, some full names do not match the fields of the table, that is, where the firstname is indicated, the last name is written, and where the lastname is, both the first name and the first name and patronymic can be written at the same time, and I keep thinking, are there any rules in the Russian language that could be transferred to the code to determine that this word is a given name or a surname?

Another difficulty is added by the fact that the full name can be not only Russian, but also the full name of people from other countries ...

Answer the question

In order to leave comments, you need to log in

4 answer(s)
I
Ilya, 2022-03-04
@mafof

Look at the dadata service, they seem to have some kind of normalization

A
Adamos, 2022-03-04
@Adamos

https://github.com/seagullua/NameCaseLib - can also determine the intended parts of the full name

V
Vladimir Korotenko, 2022-03-04
@firedragon

take a list of Russian names, add the necessary national ones.
The same goes for surnames.
By names it turns out 300-400
By surnames, I think 10,000 - 20,000
Make a suppressor on the input, plus the fact that typos will be cut off immediately.
If the form did not pass at least one field, display a message saying that you are sure?
Further on existing contacts.
I had such a code that broke the line by spaces, and displayed the first and last value.
In some cases it didn't work out right.
John Deere MD example.
I just pulled out prefixes from a base of 2 million, discarding normal names.
It turned out about 200 records, all these 200 records were described by 12 replication rules.

D
d-sem, 2022-03-04
@d-sem

Another difficulty is added by the fact that the full name can be not only Russian, but also the full name of people from other countries ...

Even within the framework of names typical for the Russian Federation, border cases can arise that complicate name recognition.
Any result in the end will have to be checked by hand. Which I highly recommend.
From what I saw in a couple of years of working with a full name:
1) Only a person has a name;
2) 4 names of a person (considering that they are also long, they are generally a horror for any documents, Russified Lutherans);
3) The coincidence of the name and surname (Balkan peoples);
4) Use of abbreviated names in the passport (Seryozha, Dima);
5) The dilemma between what is a patronymic and what is a name among the Turkic peoples;
6) Numbers in the name;
7) Mistakes in names (Vladimer).
In general, a good note on the problems of the name https://habr.com/en/post/146901/

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question