Answer the question
In order to leave comments, you need to log in
How to weed out irrelevant texts?
Good afternoon.
There is a set of texts. This set contains insignificant messages from the category "+100500", "I am of the same opinion", as well as significant ones "Angara cost the state too much", "What happened in Siberia is simply amazing".
Those. significance, for a given task, is determined by the ability to "attach" a message to a specific topic. It is required roughly (quickly) to select significant texts.
Tell me, what approaches can be?
So far, it occurred to me only with the help of mystem/phpmorphy to determine which parts of speech in what ratio are found and by this coefficient. filter... However, this will obviously not be very efficient and may discard meaningful texts...
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question