Answer the question
In order to leave comments, you need to log in
Criteria for selecting significant features for SVM classification (support vector machine)?
Hi friends!
Please help with advice or a link.
How to choose features for SVM classification? Is it necessary to normalize the numerical values of these features?
There is a task - with the help of SVM, learn to separate the wheat from the chaff.
The grains have some characteristic features by which they can be distinguished, but which features should be taken?
I'll give you an example. Let's say the grain has a weight in milligrams. The spit also has a weight, but on average it is different from the grain. Is it possible to take the weight of the grain as a sign, or is it necessary to take the logarithm of the weight, because there are very small grains, and there are very large ones?
How to correctly select the ratio of grains and chaff in the training sample? What should it be? 50/50? Or taken from real life - they harvested grain, took a handful from it and made a sample from it (i.e., the ratio is close to real)?
What to do if the number of grains in reality (in the training sample) relates to the number of tares as 1/200? Does it spoil the training sample?
After all, it is the grains that need to be singled out - they are important, and there are just very few of them.
Is there any manual from the "SVM for Dummies" series that would cover these questions simple questions on the fingers, without solving complex systems of equations?
Answer the question
In order to leave comments, you need to log in
First of all, don't get hung up on SVM: it's just one of many classification methods. Yes, SVM has its own specifics (other methods have their own), but at this stage you can use general data preprocessing algorithms.
what signs to take?This is called feature selection and feature extraction .
Is it necessary to normalize the numerical values of these features?It strongly depends on the specific task and the signs themselves.
What to do if the number of grains in reality (in the training sample) relates to the number of tares as 1/200? Does it spoil the training sample?In general, it spoils: if some examples are much less than others, there is a risk that the classifier will “remember” the examples from the training set, and will not be able to adequately recognize other similar examples ( Overfitting ).
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question