How to do binary classification of texts in python?

K

kosmo_tony2018-10-04 10:18:53

Python

kosmo_tony, 2018-10-04 10:18:53

There is a set of scientific articles on various topics. It is necessary to classify into two classes: mathematical and non-mathematical. I did not find it, but maybe there is already a ready-made solution for this or a similar problem?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

dmshar, 2018-10-04
@dmshar

In fact, there are not just a lot of materials on the net, but an immense amount. For every color and taste, tailored to any tool you own. Almost any book on machine learning or neural networks today necessarily contains a section on working with texts and classification - the simplest task, which is discussed at the same time.
Here is an almost elementary introduction to the topic, what, how and why to do it:
https://tproger.ru/translations/text-classificatio...
Here it is divided into 20 topics, but you can cut it down to the two you need:
scikit-learn .org/stable/tutorial/text_analytics/wo...
Here is a "Ready solution" described using another library
www.nltk.org
But the main problem is not in the sources. The main question is do you have enougha voluminous and at the same time labeled data set for training any algorithm? If there is, you can also study the sources, and if not, then think about how to find such a set.

D

Danil, 2018-10-04
@DanilBaibak

I support dmshar in that there are many examples of solving a similar problem on the network. I just want to add that if you do not have tagged data, but you are sure that there are only 2 topics in the texts, there are also classification methods - a good example .