Criteria for selecting significant features for SVM classification (support vector machine)?

P

pixx2010-11-14 14:03:36

Support vector machine

pixx, 2010-11-14 14:03:36

Hi friends!
Please help with advice or a link.
How to choose features for SVM classification? Is it necessary to normalize the numerical values of these features?
There is a task - with the help of SVM, learn to separate the wheat from the chaff.
The grains have some characteristic features by which they can be distinguished, but which features should be taken?
I'll give you an example. Let's say the grain has a weight in milligrams. The spit also has a weight, but on average it is different from the grain. Is it possible to take the weight of the grain as a sign, or is it necessary to take the logarithm of the weight, because there are very small grains, and there are very large ones?
How to correctly select the ratio of grains and chaff in the training sample? What should it be? 50/50? Or taken from real life - they harvested grain, took a handful from it and made a sample from it (i.e., the ratio is close to real)?
What to do if the number of grains in reality (in the training sample) relates to the number of tares as 1/200? Does it spoil the training sample?
After all, it is the grains that need to be singled out - they are important, and there are just very few of them.
Is there any manual from the "SVM for Dummies" series that would cover these questions simple questions on the fingers, without solving complex systems of equations?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

Y

YasonBy, 2010-11-15
@YasonBy

First of all, don't get hung up on SVM: it's just one of many classification methods. Yes, SVM has its own specifics (other methods have their own), but at this stage you can use general data preprocessing algorithms.

what signs to take?

This is called feature selection and feature extraction .
In simple words, the process looks like this:
1. We make a list of available features.
2. We add to it various functions from features (like the mentioned logarithm of weight), combinations of different features (for example, length * width * height), etc. What exactly to combine and what transformations to use should be prompted by knowledge of the task and common sense. This process is referred to as feature extraction.
3. We set the error function, that is, we determine how the classification accuracy will be evaluated. For example, it can be the ratio of correctly recognized examples to their total number. Here it is useful to read about precision and recall .
4. We pass to one level of abstraction above.
Let's imagine a kind of black box, inside of which there is a classifier along with training and testing samples. At the input of the box, a binary vector indicating which features the classifier should use; at the output - the value of the classification error (on the test sample).
Thus, the problem of feature selection is reduced to an optimization problem: you need to find an input vector for which the output value of the box (classification error) will be minimal. You can, for example, add features one at a time (starting with those that improve the result the most) - see gradient descent . You can use something more serious, such as genetic algorithms .

Is it necessary to normalize the numerical values of these features?

It strongly depends on the specific task and the signs themselves.

What to do if the number of grains in reality (in the training sample) relates to the number of tares as 1/200? Does it spoil the training sample?

In general, it spoils: if some examples are much less than others, there is a risk that the classifier will “remember” the examples from the training set, and will not be able to adequately recognize other similar examples ( Overfitting ).
In addition, if the simplest error function (correct_recognized / sample_size) is used, a philosophical classifier can always answer “chaff” - and in 99.5% of cases it will be right :)