R
R
radio0072021-05-22 19:20:39
Python
radio007, 2021-05-22 19:20:39

I build a bitcoin price forecast using linear regression. But the predicted values ​​are the same. Where is the mistake?

I take data from https://www.blockchain.com/en/charts#block .

My code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

pd.set_option('display.max_columns', None)

data = pd.read_csv('data.csv', delimiter=';', index_col=['Time'], dayfirst=True)
data.index = pd.to_datetime(data.index, format='%d.%m.%Y')
data = data.resample('W').mean()
y = data['market_price']
x = data.drop(['market_price'], axis=1)

models = [# LinearRegression(),  # метод наименьших квадратов
          RandomForestRegressor(n_estimators=100, max_features='sqrt'),  # случайный лес
          # KNeighborsRegressor(n_neighbors=6),  # метод ближайших соседей
          # SVR(kernel='linear'),  # метод опорных векторов с линейным ядром
          ]

X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.2, random_state=0)

model = models[0].fit(X_train, Y_train)

r_2 = {'R2_Y': r2_score(Y_test, model.predict(X_test))}
print(r_2)

pred_test = pd.DataFrame({
    'pred': model.predict(X_test),
    'real': Y_test
})

pred_test.plot()
plt.show()

new_dates = pd.date_range('2021-05-09', '2022-01-02', freq='W')
new_dates = pd.Index(x.index) | new_dates
x2 = pd.DataFrame({'Time': new_dates})
y2 = pd.DataFrame({'Time': new_dates})
x_new = pd.merge(x, x2, on='Time', how='right')
y_new = pd.merge(y, y2, on='Time', how='right')
x_new = x_new.set_index('Time')
x_new = x_new.fillna(0)
y_new = y_new.set_index('Time')

model_2 = models[0].fit(x, y)

r_2 = {'R2_Y': r2_score(y, model_2.predict(x))}
print(r_2)

pred = pd.DataFrame({
    'pred': model_2.predict(x_new),
    'real': y_new.market_price
})

pred.plot()
plt.show()


And here is the result:

60a92e6365a60132718905.png

60a92f09b44f3230609174.png

Sample Data

market_price  trade_volume      hashrate  transactions_per_day
Time                                                                      
2016-05-08    452.953333  9.812436e+05  1.409150e+06         227158.000000
2016-05-15    453.300000  1.940460e+06  1.337420e+06         235977.500000
2016-05-22    445.025000  2.009665e+06  1.400187e+06         201746.500000
2016-05-29    471.270000  3.097414e+06  1.520550e+06         218714.666667
2016-06-05    549.925000  5.691195e+06  1.302883e+06         227920.000000


Who knows what could be the problem, please tell me.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
dmshar, 2021-05-25
@dmshar

Yes, I already answered this question. The problem is that a multivariate regression model is being built, i.e. the output variable depends not only on time, but also on a number of additional parameters. And when predicting a move, only the time and zeroed values ​​of the remaining parameters are supplied (this can be seen in the code). So it turns out that in fact at the output we get the value of the coefficient b0.
Here the problem is methodological - I don’t understand how the value of the price can be predicted, for example, in terms of sales at the same moment . Those. The model is not correct in essence and it needs to be substantially revised. Well , at leasttake these values ​​"for yesterday" and predict "for today". (This is my favorite thesis about what happens when they start learning a tool before theory and reduce all machine learning to the ability to apply the .fit () method.
Not to mention the fact that predicting values ​​​​in the Forex market or blockchain is in itself still that trash.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question