Answer the question
In order to leave comments, you need to log in
Bayesian classifier, category selection problem
The situation is as follows:
There is a classifier with the help of which some documents are divided into several categories (let it be 'good', 'bad' and 'unknown').
Everything is calculated according to the formula
Pr(Category | Document) = Pr(Document | Category) x Pr(Category)
Pr(category) - the probability of a random document falling into this category, calculated by the formula
number of documents in this category / total number of
documents that after training in one of the categories of documents it turned out 4 times more than in the rest, respectively, any classified document falls into this category. If the number of samples in the categories is approximately the same, everything works as it should (which, in principle, is not surprising).
Question: how to fight?
Answer the question
In order to leave comments, you need to log in
I assume that the most correct option is to equalize the number of samples in the categories by carefully sampling data for training, but what if?
Agree with previous advice. Try to select data for the training sample in such a way that it represents the distribution of the general population, and train the classifier on it.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question