Answer the question
In order to leave comments, you need to log in
Fetching similar results / MySQL
There is a base with approximately the following content:
It is necessary to select and count the number of similar records using PHP / MySQL.
For example:
How to implement?
Пупкин Вася
Пупкин В.
Вася Пупкин
Фамилькин Л.
Людвиг Аристархович Фамилькин
Фамилькин
Держиморда Кирилл
Пупкин - 3 шт.
Фамилькин - 3 шт.
Держиморда - 1 шт.
Answer the question
In order to leave comments, you need to log in
You first decide how you will select the cluster. If you add “Ludwig Aristarkhovich Pupkin” to your list, where will you take him? To the Pupkins, or to the Ludwigs, or to both at the same time?
Once you decide how to select a cluster, you can use the Levenshtein distance to determine if you belong to a cluster.
I. If you want to count the number of identical words , that is:
Pupkin - occurs 3 times
Vasya - occurs 2 times, etc., then this can be done either using a stored procedure, or another iteration method (in php), or (better ) normalize the data so that there is only one word in one line, then it will be possible to use the GROUP BY operator.
II. If you want to count the number of identical surnames
(which I think you want), then again it is better to
normalize the database according to this principle: other patronymic-specific endings, otherwise take the next longest word in the line.
2. Take the shortest word in the line as the name.
Write this data into a table with the Last Name and First Name columns (or some other suitable format), while you can save the keys to the original records. After that, you can use GROUP BY on the Last Name field.
Something like this. Your problem is in unnormalized data.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question