How to quickly search the database, full name?

R

rudejah2014-12-16 09:12:34

PHP

rudejah, 2014-12-16 09:12:34

Good day!
I deal with the analysis of reports from banks.
In order to correctly search for someone and for what it was paid, you need a personal account, full name, and if there is nothing of this, then the address.
All this information may be indicated in a chaotic manner, or not indicated at all.
With a personal account - it turned out to be pulled out.
Now I want to try to pull out also the full name.
How is the first name indicated? Yes, whatever! For example, like this:
1. petrov and and
2 Petrov Ivan I
3. Petrov I Ivanovich
4. petrov and. and.
5. PETROV IVAN IVANOVICH
6. Payment for Petrov Ivan Ivanovich
7. Petrov I.I.
8. I. I. Petrov
9. I. I.
PETROV 10. PETROV i i
11. N. N. Petrov and Zh.
and all similar options The
comparison goes with billing records, which were also entered in the same chaotic manner =(
The idea was this - lowercase both sides, check if the IO is completely suitable, then it's good, if not, then leave one letter at a time ( first) and then make comparisons ..
Maybe there are some other options? Or someone else

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

N

Nikolai Pavlov, 2014-12-16
@gurinderu

Try this) SELECT * FROM some_table WHERE BINARY name=:text_to_find

V

Viktor Vsk, 2014-12-21
@viktorvsk

What to do with options 6 and 11, in the first approximation, xs.
In other cases, if you are interested in "at least a partial match", I would suggest converting everything to lower case, imagine that you have "3 elements (components)". One of them is a surname. Others - first name and patronymic (no matter what order, initials or full, etc.). And then - to compare with a pre-prepared database of surnames. I think for the Russian-speaking population, for 80% of cases, this list will still not be very large.
The algorithm is like this. Among the three elements, we found a word that matches the surname. Then check the remaining two elements. First, by complete coincidence (as an option, by a fuzzy comparison algorithm and trying to find the desired threshold), if it didn’t help, by initials.
It seems to me that somehow the result can be satisfactory (for a first approximation, plus - the implementation is not so difficult to check). And then start from efficiency and goals.