Answer the question
In order to leave comments, you need to log in
Comparing text data using php?
Tell me a ready-made solution in php, which takes two texts and gives their difference as an answer. Such a comparator using php.
A solution that understands utf8 is desirable. But in any case, I would appreciate any suggestions. Including links to descriptions of text comparison algorithms. Or maybe their Java/C++ implementations.
I need to determine exactly the difference between large texts. I have no task to determine the similarity of lines, texts.
Answer the question
In order to leave comments, you need to log in
I think this is what you need:
easywebscripts.net/php/php_text_differences.php
/**
* Выделение различий в текстах (с точностью до строк или слов)
* Изменения оборачиваются в тег "span" с классами 'added', 'deleted', 'changed
* алгоритм: http://easywebscripts.net/php/php_text_differences.php
*
* @return array - тексты A и B
* @param string $textA
* @param string $textB
* @param string $delimeter - "пробел": будет искать изменения с точностью до слова, "\n": с точностью до строки
*/
function getTextDiff($textA, $textB, $delimeter = "\n") {
if (!is_string($textA) || !is_string($textB) || !is_string($delimeter)) {
return FALSE;
}
// Получение уникальных слов(строк)
$arrA = explode($delimeter, str_replace("\r", "", $textA));
$arrB = explode($delimeter, str_replace("\r", "", $textB));
$unickTable = array_unique(array_merge($arrA, $arrB));
$unickTableFlip = array_flip($unickTable);
// Приводим к тексту из идентификаторов
$arrAid = $arrBid = array();
foreach($arrA as $v) {
$arrAid[] = $unickTableFlip[$v];
}
foreach($arrB as $v) {
$arrBid[] = $unickTableFlip[$v];
}
// Выбор наибольшей общей последовательности
$maxLen = array();
for ($i = 0, $x = count($arrAid); $i <= $x; $i++) {
$maxLen[$i] = array();
for ($j = 0, $y = count($arrBid); $j <= $y; $j++) {
$maxLen[$i][$j] = '';
}
}
for ($i = count($arrAid) - 1; $i >= 0; $i--) {
for ($j = count($arrBid) - 1; $j >= 0; $j--) {
if ($arrAid[$i] == $arrBid[$j]) {
$maxLen[$i][$j] = 1 + $maxLen[$i+1][$j+1];
} else {
$maxLen[$i][$j] = max($maxLen[$i+1][$j], $maxLen[$i][$j+1]);
}
}
}
$longest = array();
for ($i = 0, $j = 0; $maxLen[$i][$j] != 0 && $i < $x && $j < $y;) {
if ($arrAid[$i] == $arrBid[$j]) {
$longest[] = $arrAid[$i];
$i++;
$j++;
} else {
if ($maxLen[$i][$j] == $maxLen[$i+1][$j]) {
$i++;
} else {
$j++;
}
}
}
// Сравниваем строки, ищем изменения
$arrBidDiff = array();
$i1 = 0; $i2 = 0;
for ($i = 0, $iters = count($arrBid); $i < $iters; $i++) {
$simbol = array();
if (isset($longest[$i1]) && $longest[$i1] == $arrBid[$i2]) {
$simbol[] = $longest[$i1];
$simbol[] = "*";
$arrBidDiff[] = $simbol;
$i1++;
$i2++;
} else {
$simbol[] = $arrBid[$i2];
$simbol[] = "+";
$arrBidDiff[] = $simbol;
$i2++;
}
}
$arrAidDiff = array();
$i1 = 0; $i2 = 0;
for ($i = 0, $iters = count($arrAid); $i < $iters; $i++) {
$simbol = array();
if (isset($longest[$i1]) && $longest[$i1] == $arrAid[$i2]) {
$simbol[] = $longest[$i1];
$simbol[] = "*";
$arrAidDiff[] = $simbol;
$i1++;
$i2++;
} else {
$simbol[] = $arrAid[$i2];
$simbol[] = "-";
$arrAidDiff[] = $simbol;
$i2++;
}
}
// Меняем идентификаторы обратно на текст
$arrAdiff = array();
foreach($arrAidDiff as $v) {
$arrAdiff[] = array(
$unickTable[$v[0]],
$v[1],
);
}
$arrBdiff = array();
foreach($arrBidDiff as $v) {
$arrBdiff[] = array(
$unickTable[$v[0]],
$v[1],
);
}
// Если на одной и той же позиции у текста A "добавлено" а у B "удалено" - меняем метку на "изменено"
$max = max(count($arrAdiff), count($arrBdiff));
for ($i1 = 0, $i2 = 0; $i1 < $max && $i2 < $max;) {
if (!isset($arrAdiff[$i1]) || !isset($arrBdiff[$i2])) {
// no action
} elseif ($arrAdiff[$i1][1] == "-" && $arrBdiff[$i2][1] == "+" && $arrBdiff[$i2][0] != "") {
$arrAdiff[$i1][1] = "*";
$arrBdiff[$i2][1] = "m";
} elseif ($arrAdiff[$i1][1] != "-" && $arrBdiff[$i2][1] == "+") {
$i2++;
} elseif ($arrAdiff[$i1][1] == "-" && $arrBdiff[$i2][1] != "+") {
$i1++;
}
$i1++;
$i2++;
}
// Оборачиваем изменения в теги для последующей стилизации
$textA = array();
foreach($arrAdiff as $v) {
if ('+' == $v[1]) {
$textA[] = '<span class="added">' . $v[0] . '</span>';
} elseif ('-' == $v[1]) {
$textA[] = '<span class="deleted">' . $v[0] . '</span>';
} elseif ('m' == $v[1]) {
$textA[] = '<span class="changed">' . $v[0] . '</span>';
} else {
$textA[] =$v[0];
}
}
$textA = implode($delimeter, $textA);
$textB = array();
foreach($arrBdiff as $v) {
if ('+' == $v[1]) {
$textB[] = '<span class="added">' . $v[0] . '</span>';
} elseif ('-' == $v[1]) {
$textB[] = '<span class="deleted">' . $v[0] . '</span>';
} elseif ('m' == $v[1]) {
$textB[] = '<span class="changed">' . $v[0] . '</span>';
} else {
$textB[] =$v[0];
}
}
$textB = implode($delimeter, $textB);
return array($textA, $textB);
}
int similar_text(string str_first, string str_second [, double percent])
This function determines if two strings are similar.
The similar_text() function determines the similarity of two strings using Oliver's algorithm. The function returns the number of characters that matched in the strings str_first and str_second. The third optional parameter is passed by reference and stores the percentage of matching strings in it.
www.softtime.ru/bookphp/gl3_11.php
If you need to display exactly the difference in lines, use the console diff.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question