How to design a classifier with different data structures for training and testing?

A

Alexander Kurakin2016-10-08 22:08:08

Machine learning

Alexander Kurakin, 2016-10-08 22:08:08

I am writing my classifier on scikit-learn. That is, I inherit sklearn.base.BaseEstimator. But the data structure on training and on the test is different (the object being classified is a sports match),

in training, these are the names of teams and a set of events responsible for the match,
and on the test, these are only the names of the teams.

I note that it is generally difficult to represent a "set of events" in the form of a matrix ...
How to arrange it?
UPD. Meaning: what signatures should methods have if I inherit sklearn.base.BaseEstimator?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

V

Vlad_Fedorenko, 2016-10-09
@Vlad_Fedorenko

Either do not use in training features that will not be available in the test, or use information from the train a la for team 1, the average win rate is 0.67. But here it’s easy to retrain and run into a situation where the test will have a team that was not in the train