Z
Z
Zolg2022-03-31 11:56:38
Python
Zolg, 2022-03-31 11:56:38

Are there databases with fuzzy full text search?

There is a set (~ one hundred thousand pieces) of blocks of text (in Russian) of small length (~ 500 characters).
A similar block of text is given as input, you need to find a match for it in the original set (or lack thereof).
As a rule, the input blocks either have no analogues in the original set at all, or completely coincide with some blocks (the task is trivial), or have small differences (from punctuation-spelling to order and or the absence / presence of small fragments of text). Functionally, TheFuzz
copes with the task quite well , using Levenshtein distance + tokenization. But running a full non-indexed python iteration of hundreds of thousands of comparisons for each request is not the most efficient thing.

Perhaps there are some databases that can do such a search out of the box?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question