How to manage the degree of confidence in the training sample in machine learning?

F

FlasheR_SPb2017-03-18 23:50:20

Python

FlasheR_SPb, 2017-03-18 23:50:20

Is it possible to specify different degrees of confidence in random forest and/or gradient boosting algorithms by splitting a large training set into time intervals?
That is, if we have a very large sample of data in chronological order, and we want to use data from 10 years ago for training, but at the same time, so that data from a year ago influences the result to a greater extent.
Is it possible? Which way to look? What to read about?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Arseny Kravchenko, 2017-03-19
@Arseny_Info

Upsampling new data, downsampling old data

A

alexnss, 2017-05-09
@alexnss

Here it is more correct to call this parameter not the degree of trust, but the weight.
For boosting, LigthGBM definitely knows how - Parameters are described here The parameter is called weight
For Random forest in the package for R ranger there is a parameter case.weights

Details

Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.