Temporal Fusion Transformer for Multi-Horizon Probabilistic Forecasting of Weekly Retail Sales
- URL: http://arxiv.org/abs/2511.00552v1
- Date: Sat, 01 Nov 2025 13:34:29 GMT
- Title: Temporal Fusion Transformer for Multi-Horizon Probabilistic Forecasting of Weekly Retail Sales
- Authors: Santhi Bharath Punati, Sandeep Kanta, Udaya Bhasker Cheerala, Madhusudan G Lanjewar, Praveen Damacharla,
- Abstract summary: We present a novel study of weekly Walmart sales using a Temporal Fusion Transformer (TFT)<n>The pipeline produces 1--5-week-ahead probabilistic forecasts via Quantile Loss.<n>On a fixed 2012 hold-out dataset, TFT achieves an RMSE of $57.9k USD per store-week and an $R2$ of 0.9875.
- Score: 5.023398151088689
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurate multi-horizon retail forecasts are critical for inventory and promotions. We present a novel study of weekly Walmart sales (45 stores, 2010--2012) using a Temporal Fusion Transformer (TFT) that fuses static store identifiers with time-varying exogenous signals (holidays, CPI, fuel price, temperature). The pipeline produces 1--5-week-ahead probabilistic forecasts via Quantile Loss, yielding calibrated 90\% prediction intervals and interpretability through variable-selection networks, static enrichment, and temporal attention. On a fixed 2012 hold-out dataset, TFT achieves an RMSE of \$57.9k USD per store-week and an $R^2$ of 0.9875. Across a 5-fold chronological cross-validation, the averages are RMSE = \$64.6k USD and $R^2$ = 0.9844, outperforming the XGB, CNN, LSTM, and CNN-LSTM baseline models. These results demonstrate practical value for inventory planning and holiday-period optimization, while maintaining model transparency.
Related papers
- TFT-ACB-XML: Decision-Level Integration of Customized Temporal Fusion Transformer and Attention-BiLSTM with XGBoost Meta-Learner for BTC Price Forecasting [0.7857499581522376]
Existing deep learning models often struggle with interpretability and generalization across diverse market conditions.<n>This research presents a hybrid stacked-generalization framework, TFT-ACB-XML, for BTC closing price prediction.<n> Empirical validation using BTC data from October 1, 2014, to January 5, 2026, shows improved performance of the proposed framework.
arXiv Detail & Related papers (2026-02-12T20:20:56Z) - Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking [51.56484100374058]
We evaluate whether a fully automatic, purely feedback-driven ESN can serve as a competitive alternative to widely used statistical forecasting methods.<n>Forecast accuracy is measured using MASE and sMAPE and benchmarked against simple benchmarks like drift and seasonal naive and statistical models.
arXiv Detail & Related papers (2026-02-03T16:01:22Z) - LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G [85.58816960936069]
Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Time (Near-RT) latency and computational constraints.<n>This paper investigates a post-Transformer paradigm for efficient radio telemetry forecasting.<n>We propose a quantum-inspired state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels.
arXiv Detail & Related papers (2026-01-18T12:08:38Z) - TF-CoDiT: Conditional Time Series Synthesis with Diffusion Transformers for Treasury Futures [9.869634509510016]
Diffusion Transformers (DiT) have achieved milestones in synthesizing financial time-series data, such as stock prices and order flows.<n>This work emphasizes the characteristics of treasury futures data, including its low volume, market dependencies, and the grouped correlations among multivariables.<n>We propose TF-CoDiT, the first DiT framework for language-controlled treasury futures synthesis.
arXiv Detail & Related papers (2026-01-17T02:27:56Z) - Test time training enhances in-context learning of nonlinear functions [51.56484100374058]
Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction.<n>We investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time.
arXiv Detail & Related papers (2025-09-30T03:56:44Z) - Short-Term Forecasting of Energy Production and Consumption Using Extreme Learning Machine: A Comprehensive MIMO based ELM Approach [0.0]
A novel methodology for short-term energy forecasting using an Extreme Learning Machine ($mathtELM$) is proposed.<n>Using six years of hourly data collected in Corsica (France) from multiple energy sources, our approach predicts both individual energy outputs and total production.<n>The model maintains high accuracy up to five hours ahead, beyond which renewable energy sources become increasingly volatile.
arXiv Detail & Related papers (2025-08-18T09:37:54Z) - CoIFNet: A Unified Framework for Multivariate Time Series Forecasting with Missing Values [17.25081407284703]
Collaborative Imputation-Forecasting Network (CoIFNet) is a novel framework that unifies imputation and forecasting.<n>CoIFNet takes the observed values, mask matrix and timestamp embeddings as input, processing them sequentially.<n>We demonstrate the effectiveness and computational efficiency of our proposed approach across diverse missing-data scenarios.
arXiv Detail & Related papers (2025-06-16T03:15:12Z) - Sundial: A Family of Highly Capable Time Series Foundation Models [64.6322079384575]
We introduce Sundial, a family of native, flexible, and scalable time series foundation models.<n>Our models are pre-trained without specifying any prior distribution and can generate multiple probable predictions.<n>Sundial achieves state-of-the-art results on both point and probabilistic forecasting benchmarks with a just-in-time inference speed.
arXiv Detail & Related papers (2025-02-02T14:52:50Z) - Neural and Time-Series Approaches for Pricing Weather Derivatives: Performance and Regime Adaptation Using Satellite Data [0.0]
This paper studies pricing of weather-derivative (WD) contracts on temperature and precipitation.<n>We benchmark a harmonic-regression/ARMA model against a feed-forward neural network (NN), finding that the NN reduces out-of-sample mean-squared error (MSE)<n>For precipitation, we employ a compound Poisson--Gamma framework: shape and scale parameters are estimated via maximum likelihood estimation (MLE) and via a convolutional neural network (CNN) trained on 30-day rainfall sequences spanning multiple seasons.
arXiv Detail & Related papers (2024-11-18T19:54:28Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - Short-Term Electricity Load Forecasting Using the Temporal Fusion
Transformer: Effect of Grid Hierarchies and Data Sources [0.0]
We study the potential of the Temporal Fusion Transformer (TFT) architecture for hourly short-term load forecasting.
We find that the TFT architecture does not offer higher predictive performance than a state-of-the-art LSTM model for day-ahead forecasting on the entire grid.
The results display significant improvements for the TFT when applied at the substation level with a subsequent aggregation to the upper grid-level.
arXiv Detail & Related papers (2023-05-17T20:33:51Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Deep learning for gravitational-wave data analysis: A resampling
white-box approach [62.997667081978825]
We apply Convolutional Neural Networks (CNNs) to detect gravitational wave (GW) signals of compact binary coalescences, using single-interferometer data from LIGO detectors.
CNNs were quite precise to detect noise but not sensitive enough to recall GW signals, meaning that CNNs are better for noise reduction than generation of GW triggers.
arXiv Detail & Related papers (2020-09-09T03:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.