Answer the question
In order to leave comments, you need to log in
Fuzzy search?
There are two lines, the 1st is short 1-3 words, the second is long 10-20 words, you need to determine whether the first line is in the second or how many percent it is there. Recommend algorithms :)
Answer the question
In order to leave comments, you need to log in
I once wrote a thesis on this topic; it turned out that it is best to compare Russian words by the length of the maximum common prefix (as a percentage of the length of the smallest of the words, must be above the threshold). To compare sentences - really compare the words of the strings in pairs and display the similarity function through the distances between similar words.
try to calculate the Levenshtein distance: D0%B8%D0%B5_%D0%9B%D0%B5%D0%B2%D0%B5%D0%BD%D1%88%D1%82%D0%B5%D0%B9%D0%BD%D0% B0
example
bytes.com/topic/python/answers/580959-fuzzy-string-comparison
There are many different distances between words, I would split the phrases into words and take the average of the maximum obtained measures of pairwise matching of words.
You can compare using the trigram method. Gives a certain result, even if words with different endings, etc.
Traditionally, the Levenshtein distance.
But I would recommend using the common longest subsequence. At the same time, you can introduce a certain penalty for gaps between words.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question