J
J
Jamaludin Osmanov2018-05-27 23:46:06
PHP
Jamaludin Osmanov, 2018-05-27 23:46:06

How to find and display contacts with possible typos?

There is a MySQL database with a contact table
with id and name columns (350,000 contacts). It is necessary to collect them into contact groups with possible typos.
For example:
Timur
Dmitry
Linar
Linur
Temur
Dimitry
Legar
It is necessary to collect and display:
Timur, Temur
Dmitry, Dimitri
Linur, Linar, Legar.
I checked for typos using the levenshtein()<2 function.
It would be possible to iterate over the array once and return the result. But the problem is that the contact database is constantly changing and changing. And you always need an up-to-date option (at least an up-to-date one for every 5-10 minutes), but given that the search is a very large amount of time (1.3 contacts per second), this is not permissible. Please advise how this task can be accomplished. Or optimize your search.
The search was performed in the following way:
1. Requested contacts from the database
2. Each contact was checked with each for the possibility of a typo "Levenshtein"

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
d-stream, 2018-05-28
@d-stream

The first thing that suggests itself is to filter out clearly incomparable ones. For example, what's the point of comparing a three-letter name with a fifteen-character one?
Well, a general question: in which of the names of Timur, Temur and Teimur is a typo?

S
Sergey, 2018-05-28
@butteff

But the problem is that the contact database is constantly changing and changing. And you always need an up-to-date option (at least an up-to-date one for every 5-10 minutes), but given that the search is a very large amount of time (1.3 contacts per second), this is not permissible.

Check the data in the database once, and then check for typos before writing to it.
You can check partially by name, using the LIKE condition , but it's probably better to just go through the entire database once.
> Or to optimize search
Indexes, caching.
You can also take some sphinxsearch .

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question