A
A
Azat Yarullin2014-02-09 13:35:42
PHP
Azat Yarullin, 2014-02-09 13:35:42

SVM: how to properly train a machine? What input data should be?

For tests, the php_svm module is used: www.php.net/manual/ru/book.svm.php
There is an incoming data stream: city, time (hour), temperature.
On the output, the result of 5 options: 1, 2, 3, 4, 5 It is
necessary to teach the machine to find the response identifier depending on the values ​​of the input data.
How can I submit this data for training? Is my logic correct?
About the result:
Since the result can only be -1 and 1, it turns out that you need to transfer the result identifier to the input parameters? For example: the city of Moscow, 12 hours, 0 degrees, the answer is 3 - and these input data will give the answer 1. With a different number of the answer, respectively -1.
About the city:
It is necessary to compile indices by city: 1 - Moscow, 2 - Samara, 3 - Yekaterinburg. And then use these indexes as keys.
By hours, temperature and answer:
Use the same index as the city. For uniqueness to indexes I give different ranges.
For clarity, I will present an array with indices:

$indexes = [
  // города
  1, ... 99,
  // часы
  100, ... 199,
  // температура
  200, ... 299,
  // ответы
  300, ... 399,
];

For training, the following data is obtained:
$train = [
  [-1,	1 => 1, 100 => 1, 200 => 1, 300 => 1], // город 1, час 100, температура 200, ответ 300
  [-1,	1 => 1, 100 => 1, 200 => 1, 301 => 1], // город 2, час 120, температура 230, ответ 301
  [1,		1 => 1, 100 => 1, 200 => 1, 302 => 1], // город 2, час 120, температура 230, ответ 302
  [1,		3 => 1, 120 => 1, 304 => 1], // город 3, час 120, температура неизвестна, ответ 304
];
$model = $svm->train($train);

And a similar array with data to check:
// В комментариях ожидаемый мной результат
$tests = [
  [1 => 1, 100 => 1, 200 => 1, 300 => 1], // -1?
  [100 => 1, 301 => 1], // -1?
  [100 => 1, 200 => 1, 302 => 1], // 1?
];
foreach($tests as $test){
  $result = $model->predict($test);
  var_dump($result);
}

There is a suspicion that the value of the parameters should not be one.
I tried to train the machine by randomly generating input and output data (up to 10 thousand options). As a result, the machine always gives the result -1, with the weight of the parameters equal to 1. If the weights are made equal to 0.01, then the answer is always 1.
Questions:
1. Is the logic of constructing the input data for training correct?
2. Is the array for the test correctly composed?
3. If you cannot transfer the result identifier to the input parameters, what should you do?
4. What weight should be specified for the input parameters? How to calculate it?
5. Do I need to change something if there are 20 input parameters (an average of 100 value options), and 200 result options.
6. Are there any code examples where similar tasks are analyzed?
Thank you.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
mrgloom, 2014-06-09
@mrgloom

if the output can be 5 different options, then this is a multiclass svm
it can be made based on the binary classification (which gives + -1) using the 1-vs-1 and 1-vs-all methods.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question