This comprehensive guide explores a sophisticated approach to predicting Bitcoin (BTC) prices by integrating historical price data, sentiment analysis, technical indicators, and machine learning models (XGBoost) with volatility modeling (GARCH). Below, we break down each component and step in the process.
👉 Discover how sentiment analysis enhances crypto trading
1. Data Collection
Objective: Gather and merge historical Bitcoin price data with sentiment scores.
Price Data:
- Fetched using
yfinancefor BTC-USD (Open, High, Low, Close, Volume). - Timeframe: 3 years prior to the latest sentiment data date.
- Fetched using
Sentiment Data:
- Loaded from
bitcoin_sentiments_21_24.csv. - Interpolated linearly to fill gaps; defaults to 0.5 if missing.
- Loaded from
Output:
- Merged dataset saved as
bitcoin_historical.csv.
- Merged dataset saved as
2. Feature Engineering
Objective: Enhance model accuracy with technical and derived features.
Technical Indicators:
- Moving Averages: SMA (5, 20, 50 days).
- Momentum: RSI (14 days), MACD.
- Volatility: ATR (30 days), Bollinger Bands.
Additional Features:
- Daily returns, lagged closing prices (1–3 days), volume interactions.
GARCH Model:
- Estimates volatility using Student-t distribution.
- Cleaning: Backfills missing values.
3. Data Preprocessing
Objective: Prepare data for machine learning.
- Target Variable: Next day’s closing price (
Close.shift(-1)). - Features: Includes prices, indicators, sentiment.
- Scaling: Normalized via
MinMaxScaler. - Train-Test Split: 80% training, 20% testing.
4. Model Training
Objective: Train XGBoost models for price prediction.
Hyperparameter Tuning:
- Uses
RandomizedSearchCV(50 iterations, 7-fold CV).
- Uses
Two Models:
- Excludes Sentiment.
- Includes Sentiment.
- Loss Function: Minimizes squared error (
reg:squarederror).
5. Model Evaluation
Objective: Compare model performance.
Metrics:
- MSE, RMSE, MAE, MAPE, R-squared.
- Output: Saved to
sentiment_comparison_metrics.csv. Feature Importance:
- Visualized for both models.
👉 Learn how volatility modeling improves crypto forecasts
6. Predictions Workflow
Objective: Forecast prices (180 days historical + 90 days future).
Historical Prediction:
- Compares predicted vs. actual prices.
Future Prediction:
- Simulates prices using GARCH volatility and drift rates.
- Updates features dynamically.
Output:
- Predictions saved as CSV files.
- Visualized in
combined_historical_and_future_prediction.png.
Key Components
Libraries:
- Pandas, NumPy,
yfinance,ta,arch,sklearn.
- Pandas, NumPy,
Models:
- XGBoost (price prediction), GARCH (volatility).
- Sentiment Impact: Quantified via comparative analysis.
FAQs
Q1. Why use sentiment analysis for Bitcoin price prediction?
A: Sentiment analysis captures market psychology, which can influence price movements beyond technical factors.
Q2. How does GARCH improve predictions?
A: GARCH models volatility clusters, allowing more accurate risk assessment in price simulations.
Q3. What’s the advantage of XGBoost over traditional models?
A: XGBoost handles non-linear relationships and feature interactions efficiently, improving prediction accuracy.
Q4. How far into the future can this model predict reliably?
A: While short-term (30–90 days) predictions are more reliable, long-term forecasts depend on market stability.
This script merges traditional finance techniques with modern machine learning, offering a robust tool for cryptocurrency traders and analysts.