Fetching similar results / MySQL

K

Kir Shatrov2010-12-22 20:13:30

PHP

Kir Shatrov, 2010-12-22 20:13:30

There is a base with approximately the following content: It is necessary to select and count the number of similar records using PHP / MySQL. For example: How to implement?
Пупкин Вася Пупкин В. Вася Пупкин Фамилькин Л. Людвиг Аристархович Фамилькин Фамилькин Держиморда Кирилл

Пупкин - 3 шт. Фамилькин - 3 шт. Держиморда - 1 шт.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

T

TimTowdy, 2010-12-22
@TimTowdy

You first decide how you will select the cluster. If you add “Ludwig Aristarkhovich Pupkin” to your list, where will you take him? To the Pupkins, or to the Ludwigs, or to both at the same time?
Once you decide how to select a cluster, you can use the Levenshtein distance to determine if you belong to a cluster.

J

JeanLouis, 2010-12-22
@JeanLouis

I. If you want to count the number of identical words , that is:
Pupkin - occurs 3 times
Vasya - occurs 2 times, etc., then this can be done either using a stored procedure, or another iteration method (in php), or (better ) normalize the data so that there is only one word in one line, then it will be possible to use the GROUP BY operator.
II. If you want to count the number of identical surnames
(which I think you want), then again it is better to normalize the database according to this principle: other patronymic-specific endings, otherwise take the next longest word in the line.
2. Take the shortest word in the line as the name.
Write this data into a table with the Last Name and First Name columns (or some other suitable format), while you can save the keys to the original records. After that, you can use GROUP BY on the Last Name field.
Something like this. Your problem is in unnormalized data.

J

Jazzist, 2010-12-23
@Jazzist

If the solution should be non-resource-intensive, and it needs to be done as quickly as possible, add two fields, fill them in word-breaking (by space and trim dot) full name, and search for them OR LIKE