H
H
HaruAtari2014-05-19 13:44:21
Algorithms
HaruAtari, 2014-05-19 13:44:21

How to get a list of words frequently found in a text?

There is a text. You need to parse it and display a list of words that occur in it and the number of their occurrences. At the same time, a "smart" search is needed, which would take into account word forms and in the result they were written in the infinitive.
Can you tell me what this procedure is called? Or what is the library for this? Language is not important, but PHP/Python/Java/Scala are preferred

Answer the question

In order to leave comments, you need to log in

3 answer(s)
B
becks, 2014-05-19
@becks

Have a look at Sphinx ( sphinxsearch.com/).
The procedure for bringing a word form to a normal form is called normalization (morphological task). AOT (aot.ru) can also handle it well. For a GOOD search, you need to use engines (Sphinx and others). Sphinx returns statistics by words in the results.

P
Pavel Solovyov, 2014-05-19
@pavel_salauyou

for this you need to use elasticsearch and facets

X
xmoonlight, 2018-07-27
@xmoonlight

here

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question