Answer the question
In order to leave comments, you need to log in
Comparison of two or more records in the database
Good afternoon everyone!
The task was to compare the texts of two or more records in the database with each other.
More specifically: there are fields of type text, for example 20 records in the mysql database. It is necessary to compare their texts with each other for similarity (based on similar words or phrases), to identify the percentage of similarity.
Are there ready-made solutions or scripts that could implement this?
Answer the question
In order to leave comments, you need to log in
If the result of comparing 2 tables suits you, then you can install maatkit ( percona toolkit ) and use the mk-sync-table utility from it.
This is an example from my script for deploying a database of 2 servers.
mk-table-sync --verbose --print --charset=$DB_CHARSET, h=$DBHOST_STAGE,P=$DBPORT_STAGE,u=$DBUSER_STAGE,p=$DBPASS_STAGE,t=$TABLE_LIST,D=$DBNAME_STAGE h=$DBHOST_PROD,P=$DBPORT_PROD,u=$DBUSER_PROD,p=$DBPASS_PROD,t=$TABLE_LIST,D=$DBNAME_PROD > $DIR_DB$DB_DATA_SQL
It would be nice to know what exactly needs to be compared in texts ... And so there is a standard similar_text function. Or do you need diff?
habrahabr.ru/post/115394/
habrahabr.ru/post/52120/
habrahabr.ru/post/65944/
simhash + minhash
Levenshtein distance
I return to the question, a little comprehending expressed in response.
I tried similar_text but it turned out to be too simple for my task. I'll try to specify it.
I have, for example, 30 texts. They are divided into 5 categories. Task: compare all 30 to combine them into a smaller number of similarities. Language - php, base - mysql
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question