A
A
Alexander Kurakin2016-10-08 22:08:08
Machine learning
Alexander Kurakin, 2016-10-08 22:08:08

How to design a classifier with different data structures for training and testing?

I am writing my classifier on scikit-learn. That is, I inherit sklearn.base.BaseEstimator. But the data structure on training and on the test is different (the object being classified is a sports match),

  • in training, these are the names of teams and a set of events responsible for the match,
  • and on the test, these are only the names of the teams.

I note that it is generally difficult to represent a "set of events" in the form of a matrix ...
How to arrange it?
UPD. Meaning: what signatures should methods have if I inherit sklearn.base.BaseEstimator?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
Vlad_Fedorenko, 2016-10-09
@Vlad_Fedorenko

Either do not use in training features that will not be available in the test, or use information from the train a la for team 1, the average win rate is 0.67. But here it’s easy to retrain and run into a situation where the test will have a team that was not in the train

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question