Related papers: TFT-ACB-XML: Decision-Level Integration of Customized Temporal Fusion Transformer and Attention-BiLSTM with XGBoost Meta-Learner for BTC Price Forecasting

TFT-ACB-XML: Decision-Level Integration of Customized Temporal Fusion Transformer and Attention-BiLSTM with XGBoost Meta-Learner for BTC Price Forecasting

URL: http://arxiv.org/abs/2602.12380v1
Date: Thu, 12 Feb 2026 20:20:56 GMT
Title: TFT-ACB-XML: Decision-Level Integration of Customized Temporal Fusion Transformer and Attention-BiLSTM with XGBoost Meta-Learner for BTC Price Forecasting
Authors: Raiz Ud Din, Saddam Hussain Khan,
Abstract summary: Existing deep learning models often struggle with interpretability and generalization across diverse market conditions.<n>This research presents a hybrid stacked-generalization framework, TFT-ACB-XML, for BTC closing price prediction.<n> Empirical validation using BTC data from October 1, 2014, to January 5, 2026, shows improved performance of the proposed framework.
Score: 0.7857499581522376
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Accurate forecasting of Bitcoin (BTC) has always been a challenge because decentralized markets are non-linear, highly volatile, and have temporal irregularities. Existing deep learning models often struggle with interpretability and generalization across diverse market conditions. This research presents a hybrid stacked-generalization framework, TFT-ACB-XML, for BTC closing price prediction. The framework integrates two parallel base learners: a customized Temporal Fusion Transformer (TFT) and an Attention-Customized Bidirectional Long Short-Term Memory network (ACB), followed by an XGBoost regressor as the meta-learner. The customized TFT model handles long-range dependencies and global temporal dynamics via variable selection networks and interpretable single-head attention. The ACB module uses a new attention mechanism alongside the customized BiLSTM to capture short-term sequential dependencies. Predictions from both customized TFT and ACB are weighted through an error-reciprocal weighting strategy. These weights are derived from validation performance, where a model showing lower prediction error receives a higher weight. Finally, the framework concatenates these weighted outputs into a feature vector and feeds the vector to an XGBoost regressor, which captures non-linear residuals and produces the final BTC closing price prediction. Empirical validation using BTC data from October 1, 2014, to January 5, 2026, shows improved performance of the proposed framework compared to recent Deep Learning and Transformer baseline models. The results show a MAPE of 0.65%, an MAE of 198.15, and an RMSE of 258.30 for one-step-ahead out-of-sample under a walk-forward evaluation on the test block. The evaluation period spans the 2024 BTC halving and the spot ETFs (exchange-traded funds) period, which coincide with major liquidity and volatility shifts.

Related papers

ZIP-RC: Optimizing Test-Time Compute via Zero-Overhead Joint Reward-Cost Prediction [57.799425838564]
We present ZIP-RC, an adaptive inference method that equips models with zero-overhead inference-time predictions of reward and cost.<n> ZIP-RC improves accuracy by up to 12% over majority voting at equal or lower average cost.
arXiv Detail & Related papers (2025-12-01T09:44:31Z)
Optimization of Deep Learning Models for Dynamic Market Behavior Prediction [4.594360512414794]
We study multi-horizon demand forecasting on e-commerce transactions using the UCI Online Retail II dataset.<n>We present a hybrid sequence model that combines multi-scale temporal convolutions, a gated recurrent module, and time-aware self-attention.<n>Results show consistent accuracy gains and improved on peak/holiday periods.
arXiv Detail & Related papers (2025-11-24T13:30:52Z)
Temporal Fusion Transformer for Multi-Horizon Probabilistic Forecasting of Weekly Retail Sales [5.023398151088689]
We present a novel study of weekly Walmart sales using a Temporal Fusion Transformer (TFT)<n>The pipeline produces 1--5-week-ahead probabilistic forecasts via Quantile Loss.<n>On a fixed 2012 hold-out dataset, TFT achieves an RMSE of $57.9k USD per store-week and an $R2$ of 0.9875.
arXiv Detail & Related papers (2025-11-01T13:34:29Z)
Test time training enhances in-context learning of nonlinear functions [51.56484100374058]
Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction.<n>We investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time.
arXiv Detail & Related papers (2025-09-30T03:56:44Z)
Adaptive Temporal Fusion Transformers for Cryptocurrency Price Prediction [0.0]
This paper introduces an adaptive TFT modeling approach leveraging dynamic subseries lengths and pattern-based categorization to enhance short-term forecasting.<n>Our results on ETH-USDT 10-minute data over a two-month test period demonstrate that our approach significantly outperforms baseline fixed-length TFT and LSTM models in prediction accuracy and simulated trading profitability.
arXiv Detail & Related papers (2025-09-06T20:04:46Z)
End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning [0.0]
We develop a rotation-invariant neural network that provides the global minimum-variance portfolio.<n>This explicit mathematical mapping offers clear interpretability of each module's role.<n>A single model can be calibrated on panels of a few hundred stocks and applied, without retraining, to one thousand US equities.
arXiv Detail & Related papers (2025-07-02T17:27:29Z)
A Novel Decision Ensemble Framework: Customized Attention-BiLSTM and XGBoost for Speculative Stock Price Forecasting [2.011511123338945]
This paper proposes a novel framework, CAB-XDE, for predicting the daily closing price of speculative stock Bitcoin-USD (BTC-USD) CAB-XDE framework integrates a customized bi-directional long short-term memory (BiLSTM) with the attention mechanism and the XGBoost algorithm. The proposed CAB-XDE framework is empirically validated on volatile Bitcoin market, sourced from Yahoo Finance.
arXiv Detail & Related papers (2024-01-05T17:13:30Z)
Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line [65.14099135546594]
Recent test-time adaptation (TTA) methods drastically strengthen the ACL and AGL trends in models, even in shifts where models showed very weak correlations before. Our results show that by combining TTA with AGL-based estimation methods, we can estimate the OOD performance of models with high precision for a broader set of distribution shifts.
arXiv Detail & Related papers (2023-10-07T23:21:25Z)
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM) CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance. Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z)
Diffusion Variational Autoencoder for Tackling Stochasticity in Multi-Step Regression Stock Price Prediction [54.21695754082441]
Multi-step stock price prediction over a long-term horizon is crucial for forecasting its volatility. Current solutions to multi-step stock price prediction are mostly designed for single-step, classification-based predictions. We combine a deep hierarchical variational-autoencoder (VAE) and diffusion probabilistic techniques to do seq2seq stock prediction. Our model is shown to outperform state-of-the-art solutions in terms of its prediction accuracy and variance.
arXiv Detail & Related papers (2023-08-18T16:21:15Z)
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention. We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model. Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.