Answer the question
In order to leave comments, you need to log in
How to determine text similarity?
Let's say we have tweets or article headlines. I would like to understand that these 10 news or tweets refer to the same thing (for example, to a company or event). How it's done? Although this is probably a bit of a stupid question, but at least what is the name of this range of tasks? First time in it.
By the way. I believe that news aggregators do something like this, i.e. they group them somehow, right?
Answer the question
In order to leave comments, you need to log in
this is done by more than one function
: entities are retrieved, texts are compared, etc.
see https://tech.yandex.ru/tomita/
_
Thematic clustering is called - there is a record of synonyms and their "weights" among themselves, depending on the presence of other adjacent specific words in a related chain (publications, comments or one sentence).
This can be done by extracting entities (nouns and proper names: full name of a person, names, etc.) and extracting contextual dependencies.
You can get a close search on such chains here .
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question