Answer the question
In order to leave comments, you need to log in
How to perform a fuzzy search for key phrases in the text?
There is a text in Russian. It is written in free form and potentially contains typos. It is necessary to algorithmically extract certain facts from it, described by key phrases.
Let's say we are looking for mentions of a date by the phrase сегодня в ЧЧ:ММ
. A strict match can be found by a regular expression, but it will not find various spellings - севодня в ЧЧ:ММ
, сегодняв ЧЧ:ММ
, в ЧЧ:ММ сегодня
, сегодня в полдень
and so on.
Options that came to mind:
* Regular search by sound hash (Metaphone / Soundex)
* Regular search by text passed through a stemmer / lemmatizer
* Pure full-text search (Lucene.Net)
Is there any out-of-the-box way to do this fairly well using the .NET stack? Paid services/libraries are also considered.
Answer the question
In order to leave comments, you need to log in
As an option, it comes to me to take a string, check it against the required string character by character.
If the symbol matches the symbol of the searched string, we put one, no, we put 0. If some percentage of the original phrase is typed, then we accept it.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question