F
F
FlasheR_SPb2017-03-18 23:50:20
Python
FlasheR_SPb, 2017-03-18 23:50:20

How to manage the degree of confidence in the training sample in machine learning?

Is it possible to specify different degrees of confidence in random forest and/or gradient boosting algorithms by splitting a large training set into time intervals?
That is, if we have a very large sample of data in chronological order, and we want to use data from 10 years ago for training, but at the same time, so that data from a year ago influences the result to a greater extent.
Is it possible? Which way to look? What to read about?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Arseny Kravchenko, 2017-03-19
@Arseny_Info

Upsampling new data, downsampling old data

A
alexnss, 2017-05-09
@alexnss

Here it is more correct to call this parameter not the degree of trust, but the weight.
For boosting, LigthGBM definitely knows how - Parameters are described here The parameter is called weight
For Random forest in the package for R ranger there is a parameter case.weights

Details
Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question