K
K
Kanaris2015-09-16 20:10:47
MySQL
Kanaris, 2015-09-16 20:10:47

How to select rows by similarity percentage in mysql?

There are text records (mediumtext), 1-100 kb in size, the number of records is 100k-1m.
It is required to find records with a similarity of more than 90% for their subsequent merging. PHP has a similar_text() function that is 100% up to the task. Is there something similar in mySQL? Soundex and Levenshtein are not suitable. I would really like something like:

SELECT id, SIMILAR_TEXT( 'проверяемый текст', str ) AS perc
FROM table
HAVING perc > 90

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
slinkinone, 2015-09-16
@slinkinone

It seems to me that you should write your own stored function and implement the algorithm in it, which is embedded in similar_text.
php.net/manual/ru/function.similar-text.php - here is the name of the algorithm that underlies this function.

I
Immortal_pony, 2015-09-17
@Immortal_pony

www.mysql.ru/docs/man/Fulltext_Search.html

A
Alex Safonov, 2015-09-17
@elevenelven

Relevance sorting task.
Here is a link to the theory.
https://dev.mysql.com/doc/refman/5.0/en/fulltext-n...
Here is a practical example.
jslim.net/blog/2014/01/23/mysql-search-order-by-re...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question