K
K
Kirill Sirenko2013-04-22 08:49:08
PHP
Kirill Sirenko, 2013-04-22 08:49:08

Comparison of a large number of texts (php + mysql)

Good afternoon!
There was a task on one project to compare texts from a DB.
I tried similar_text but it turned out to be too simple for my task. I'll try to specify it.
I have, for example, 30 texts. They are divided into 5 categories. Task: compare all 30 to combine them into a smaller number of similarities. Language - php, base - mysql
What is the best way to use?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
M
Max, 2013-04-22
@7workers

Try to compare words, not characters. In general, this is Bayesian categorization. But if you really have ~30 texts and not 30 thousand, then it's easier to do it by hand :)

M
MaxUp, 2013-04-22
@MaxUp

you can look in the direction of Simple NaiveBayesClassifier for PHP
A good series of articles about Bayesian categorization in php:
Implement Bayesian inference using PHP
+
on Habré recently - Probabilistic models: Bayesian networks

A
alt-j, 2013-04-22
@alt-j

If the number of classes at the output of the work is unknown, then, probably, Bayesian classification will not help you, and you need to look in the direction of clustering.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question