A
A
Andrew2017-07-15 23:06:20
data mining
Andrew, 2017-07-15 23:06:20

How to perform a fuzzy search for key phrases in the text?

There is a text in Russian. It is written in free form and potentially contains typos. It is necessary to algorithmically extract certain facts from it, described by key phrases.
Let's say we are looking for mentions of a date by the phrase сегодня в ЧЧ:ММ. A strict match can be found by a regular expression, but it will not find various spellings - севодня в ЧЧ:ММ, сегодняв ЧЧ:ММ, в ЧЧ:ММ сегодня, сегодня в полденьand so on.
Options that came to mind:
* Regular search by sound hash (Metaphone / Soundex)
* Regular search by text passed through a stemmer / lemmatizer
* Pure full-text search (Lucene.Net)
Is there any out-of-the-box way to do this fairly well using the .NET stack? Paid services/libraries are also considered.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
P
Pavel Mikhalovsky, 2017-07-16
@pavel9609

As an option, it comes to me to take a string, check it against the required string character by character.
If the symbol matches the symbol of the searched string, we put one, no, we put 0. If some percentage of the original phrase is typed, then we accept it.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question