Algorithm for matching two texts?

A

Anton Zhuchkov2021-12-15 00:02:19

Algorithms

Anton Zhuchkov, 2021-12-15 00:02:19

There are two texts of the same document. It is required to find matching or almost matching fragments. Well, that is, for example, in one text there is a header and comments. But the other one doesn't. But it is necessary to determine and preferably quickly those fragments of two texts that are the same.

It would be especially valuable to find fuzzy matches. For example, one text was obtained as a result of image recognition and in some places it is rather crooked.

Please give direction. What algorithms can be applied, what to read?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Armenian Radio, 2021-12-15
@gbg

Start with diff, then docdiff. The latter diffs Word files pretty well.
I forgot the main thing! Dissertation plagiarism detector !