P
P
Pantuchi2021-06-13 21:23:52
Neural networks
Pantuchi, 2021-06-13 21:23:52

LSTM prediction for 1 step, how to predict for N steps?

There is a dataset of 2 columns each with 4000 observations. With test validation, where the window consists of 100 values ​​and prediction by 1 step, everything works fine, but how to predict N steps beyond the dataset?

Training sample 3500, Test sample 500, window
includes
100

observations

X = []
            y = []
            for i in range(self.__seq_length, train.shape[0]):
                X.append(train[i-self.__seq_length: i])
                y.append(train[i, 0])
            return np.array(X), np.array(y)

The predicate for the first column is one value.

How to predict beyond the dataset?
I didn’t figure out how to move the windows by running a loop for 100 iterations, move the window for the second column to the left, and the window for the first column to the right, each time adding new predicts, as a result, the desired array was formed (1, 100, 2)

predicted = []

        for i in range(self.__seq_length + 1):
            p = None
            y = []
            if i == 0:
                window = self.__dataset[-(self.__seq_length + i):, 1]
                y = self.__dataset[-(self.__seq_length + i):, 0]
            else:
                window = self.__dataset[-(self.__seq_length + i):-i, 1]
                ls = -(self.__seq_length - i)
                y = []
                if ls < 0:
                    pred_next_window = self.__dataset[ls:, 0]
                    for j in range(pred_next_window.shape[0]):
                        val = pred_next_window[j]
                        y.append(val)
                    for j in range(len(predicted)):
                        val = predicted[j]
                        y.append(val)

                    y = np.array(y)
                else:
                    for j in range(len(predicted)):
                        val = predicted[j]
                        y.append(val)
                    y = np.array(y)

            new_seq = np.transpose(np.array((y, window)))

            print(str.format('iteration: {0}\n{1}', i + 1, new_seq))

            scale = MinMaxScaler()
            new_seq = scale.fit_transform(new_seq)

            new_seq = np.reshape(new_seq, (1, new_seq.shape[0], new_seq.shape[1]))
            output = self.__model.predict(new_seq)

            scale_ = self.__val_scale/scale.scale_[0]
            predict = output*scale_

            print(str.format('Next Value: {0}', predict))

            predicted.append(predict[0, 0])

        return np.array(predicted)


As a result:
While the predicate test red chart goes along with the green one, what is behind the green chart is already manipulations with moving windows, as a result, what if I still depend on the 2nd column?
60c64c67c9abe362978521.png

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
rPman, 2021-06-13
@rPman

I'm afraid your prediction fails, your algorithm produces an average, which means that this data cannot be mixed with the original data to try to predict the next data, because the dynamics of the behavior of the predicted data is different.
upd. as an option, try to train another prediction algorithm that will work solely on the data predicted by the first algorithm, since it seems to act as an averager with a faster response than classical averages
Get rid of these ladders, for any algorithm this is chaos, but you can’t predict it, transform the data to a different form, get rid of the infinite values ​​​​of the first derivative. Predict not the values ​​themselves, but some values ​​calculated over them, i.e. for example, some estimates over the data in the window (min / max, mathematical expectation, etc.) going beyond the limits, the integral (area under the graph) of values ​​\u200b\u200bcan be above and below some average, the probability of being above / below the value (in the form of a vector for several values ) or the number of hits on intervals, etc.

D
dmshar, 2021-06-14
@dmshar

It seems to me that you have some kind of misunderstanding at the most basic level.
Let's figure it out. Start over. You have a dataset with two columns. Those. you have one object that generates pairs of values ​​sequentially in time. And just generated 4000 different values.
You are making a prediction. What from what? The graph shows one series of numbers. And on the x-axis, obviously the number of observations. Those. you have a graph of the dependence of some one value, firstly, on time, and secondly, perhaps on the second value. Those. in essence, you have a multivariate (two-dimensional) regression. And in fact, your dataset should contain not two, but three columns (those that you described + time).
Of course, with such a formulation, in order to predict the value of the target variable, you need to input the time value for which you are making the forecast (it's simple) and the value of the second variable. And you can't just take it from anywhere.
A multivariate time series model, even with the help of LSTM, is already processed according to a different scheme than conventional one-dimensional time series. They use special multi-step decision schemes and some other special tricks. Describing all this in one message on the forum is a difficult task. I'd rather give links to articles that describe in detail how these tasks are solved, including using Keras.
https://www.machinelearningmastery.ru/multivariate...
https://www.machinelearningmastery.ru/how-to-devel...
https://habr.com/ru/post/495884/
https://habr.com/en/post/505338/
Look, if you figure it out, then your task can be solved easily. If you have any questions - ask, we will try to help further.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question