Answer the question
In order to leave comments, you need to log in
Are there databases with fuzzy full text search?
There is a set (~ one hundred thousand pieces) of blocks of text (in Russian) of small length (~ 500 characters).
A similar block of text is given as input, you need to find a match for it in the original set (or lack thereof).
As a rule, the input blocks either have no analogues in the original set at all, or completely coincide with some blocks (the task is trivial), or have small differences (from punctuation-spelling to order and or the absence / presence of small fragments of text). Functionally, TheFuzz
copes with the task quite well , using Levenshtein distance + tokenization.
But running a full non-indexed python iteration of hundreds of thousands of comparisons for each request is not the most efficient thing.
Perhaps there are some databases that can do such a search out of the box?
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question