L
L
letitbeniubi2019-05-08 09:44:35
Algorithms
letitbeniubi, 2019-05-08 09:44:35

How to search for cognate words in Russian texts?

Hello
, I can’t find the optimal solution for searching for single-root words in texts in Russian
that I tried
- stemming algorithms (wamania and ladamalina libraries)
- these stemmers based on the Potter algorithm do not know how to remove prefixes and generally work crookedly
- I tried to use the wikimedia api, but it takes a long time works and besides, not all words are there (although wictionary knows where the root is in the word, it does not know many words) It
did not work to find a suitable version of Tikhonov's dictionary (or another with morphemes)
Maybe someone met with such a problem?
thanks for answers

Answer the question

In order to leave comments, you need to log in

3 answer(s)
R
Ruslan., 2019-05-08
@LaRN

This is a difficult task, the solution of which also depends on the context in which the word is used.
See this link for an example:
https://en.wikipedia.org/wiki/%D0%9E%D0%BC%D0%BE%D...

D
dsadso, 2019-05-08
@dsadso

Now I'm writing a diploma in computational linguistics, namely, word splitting by composition.
I came to the conclusion that it doesn't. You will never find cognates with 100% accuracy. Only a dictionary with already prepared single-root words can help with this.
You can try to highlight roots in words:
1) According to the dictionary with division into roots (I found only 1, and even then it does not indicate where which morphemes www.speakrus.ru/dict/ UPD: this is the same Tikhonov dictionary, about which you wrote).
2) According to the rules, the drafting of which is a titanic work.
Let's say you found the roots in the words "beautiful" and "beautiful". This is "kras". Are these single words? I think no. Therefore, the meaning of the words is also important. You can only imagine a few possible options, but with 100% accuracy you will not be able to parse the text by single-root words without a dictionary with single-root words listed by a person. Where to get it - I do not know. Please post if you find anything.

T
Tibor128, 2019-05-16
@Tibor128

dig in the direction of morphological analysis.
so google it.
there are ready-made tools and sets of rules for them.
good luck ;)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question