Answer the question
In order to leave comments, you need to log in
I build a bitcoin price forecast using linear regression. But the predicted values are the same. Where is the mistake?
I take data from https://www.blockchain.com/en/charts#block .
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
pd.set_option('display.max_columns', None)
data = pd.read_csv('data.csv', delimiter=';', index_col=['Time'], dayfirst=True)
data.index = pd.to_datetime(data.index, format='%d.%m.%Y')
data = data.resample('W').mean()
y = data['market_price']
x = data.drop(['market_price'], axis=1)
models = [# LinearRegression(), # метод наименьших квадратов
RandomForestRegressor(n_estimators=100, max_features='sqrt'), # случайный лес
# KNeighborsRegressor(n_neighbors=6), # метод ближайших соседей
# SVR(kernel='linear'), # метод опорных векторов с линейным ядром
]
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.2, random_state=0)
model = models[0].fit(X_train, Y_train)
r_2 = {'R2_Y': r2_score(Y_test, model.predict(X_test))}
print(r_2)
pred_test = pd.DataFrame({
'pred': model.predict(X_test),
'real': Y_test
})
pred_test.plot()
plt.show()
new_dates = pd.date_range('2021-05-09', '2022-01-02', freq='W')
new_dates = pd.Index(x.index) | new_dates
x2 = pd.DataFrame({'Time': new_dates})
y2 = pd.DataFrame({'Time': new_dates})
x_new = pd.merge(x, x2, on='Time', how='right')
y_new = pd.merge(y, y2, on='Time', how='right')
x_new = x_new.set_index('Time')
x_new = x_new.fillna(0)
y_new = y_new.set_index('Time')
model_2 = models[0].fit(x, y)
r_2 = {'R2_Y': r2_score(y, model_2.predict(x))}
print(r_2)
pred = pd.DataFrame({
'pred': model_2.predict(x_new),
'real': y_new.market_price
})
pred.plot()
plt.show()
market_price trade_volume hashrate transactions_per_day
Time
2016-05-08 452.953333 9.812436e+05 1.409150e+06 227158.000000
2016-05-15 453.300000 1.940460e+06 1.337420e+06 235977.500000
2016-05-22 445.025000 2.009665e+06 1.400187e+06 201746.500000
2016-05-29 471.270000 3.097414e+06 1.520550e+06 218714.666667
2016-06-05 549.925000 5.691195e+06 1.302883e+06 227920.000000
Answer the question
In order to leave comments, you need to log in
Yes, I already answered this question. The problem is that a multivariate regression model is being built, i.e. the output variable depends not only on time, but also on a number of additional parameters. And when predicting a move, only the time and zeroed values of the remaining parameters are supplied (this can be seen in the code). So it turns out that in fact at the output we get the value of the coefficient b0.
Here the problem is methodological - I don’t understand how the value of the price can be predicted, for example, in terms of sales at the same moment . Those. The model is not correct in essence and it needs to be substantially revised. Well , at leasttake these values "for yesterday" and predict "for today". (This is my favorite thesis about what happens when they start learning a tool before theory and reduce all machine learning to the ability to apply the .fit () method.
Not to mention the fact that predicting values in the Forex market or blockchain is in itself still that trash.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question