I
I
inbider2017-09-19 14:17:34
go
inbider, 2017-09-19 14:17:34

How to determine the similarity (rewriting, uniqueness) of 2 texts in Go (Golang)?

Greetings to all!)
Gentlemen, it is necessary to determine the similarity (rewriting, uniqueness) of two (or more) texts among themselves. Maybe someone faced a similar task, share tips, links to libraries.
P.S. Thanks in advance!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
I
inbider, 2017-09-22
@inbider

The task turned out to be rather non-trivial and there are quite a few cases of solving it, but for those who are interested, you can start digging from here: https://4gophers.ru/articles/semanticheski-analiz-...

A
asd111, 2017-09-19
@asd111

There is difflib for python. The code below has not been tested but should work.

from difflib import SequenceMatcher
file_1 = "text_1.txt"
file_2 = "text_2.txt"
s = SequenceMatcher(lambda x: x == " ", # пропускаем пробелы
                    file_1.read(),
                    file_2.read())
print(round(s.ratio(), 3)) # число от 0 до 1. 0 - совсем не похожи ; 1 - идентичный текст

The whole thing in python can be easily parallelized, etc.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question