Are there databases with fuzzy full text search?

Z

Zolg2022-03-31 11:56:38

Python

Zolg, 2022-03-31 11:56:38

There is a set (~ one hundred thousand pieces) of blocks of text (in Russian) of small length (~ 500 characters).
A similar block of text is given as input, you need to find a match for it in the original set (or lack thereof).
As a rule, the input blocks either have no analogues in the original set at all, or completely coincide with some blocks (the task is trivial), or have small differences (from punctuation-spelling to order and or the absence / presence of small fragments of text). Functionally, TheFuzz
copes with the task quite well , using Levenshtein distance + tokenization. But running a full non-indexed python iteration of hundreds of thousands of comparisons for each request is not the most efficient thing.

Perhaps there are some databases that can do such a search out of the box?