TAT: Temporal-Aligned Transformer for Multi-Horizon Peak Demand Forecasting
- URL: http://arxiv.org/abs/2507.10349v1
- Date: Mon, 14 Jul 2025 14:51:24 GMT
- Title: TAT: Temporal-Aligned Transformer for Multi-Horizon Peak Demand Forecasting
- Authors: Zhiyuan Zhao, Sitan Yang, Kin G. Olivares, Boris N. Oreshkin, Stan Vitebsky, Michael W. Mahoney, B. Aditya Prakash, Dmitry Efimov,
- Abstract summary: We propose Temporal-Aligned Transformer (TAT), a multi-horizon forecaster leveraging apriori-known context variables for improving predictive performance.<n>Our model consists of an encoder and decoder, both embedded with a novel Temporal Alignment Attention (TAA) designed to learn context-dependent alignment for peak demand forecasting.<n>We demonstrate that TAT brings up to 30% accuracy on peak demand forecasting while maintaining competitive overall performance compared to other state-of-the-art methods.
- Score: 51.37167759339485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-horizon time series forecasting has many practical applications such as demand forecasting. Accurate demand prediction is critical to help make buying and inventory decisions for supply chain management of e-commerce and physical retailers, and such predictions are typically required for future horizons extending tens of weeks. This is especially challenging during high-stake sales events when demand peaks are particularly difficult to predict accurately. However, these events are important not only for managing supply chain operations but also for ensuring a seamless shopping experience for customers. To address this challenge, we propose Temporal-Aligned Transformer (TAT), a multi-horizon forecaster leveraging apriori-known context variables such as holiday and promotion events information for improving predictive performance. Our model consists of an encoder and decoder, both embedded with a novel Temporal Alignment Attention (TAA), designed to learn context-dependent alignment for peak demand forecasting. We conduct extensive empirical analysis on two large-scale proprietary datasets from a large e-commerce retailer. We demonstrate that TAT brings up to 30% accuracy improvement on peak demand forecasting while maintaining competitive overall performance compared to other state-of-the-art methods.
Related papers
- LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data [63.777637042161544]
This paper introduces a novel forecast post-processor that fine-tunes large language models to incorporate unstructured semantic and contextual information and historical data.<n>In an industry-scale retail application, we demonstrate that our technique yields statistically significantly forecast improvements across several sets of products subject to holiday-driven demand surges.
arXiv Detail & Related papers (2024-12-03T16:18:42Z) - Inter-Series Transformer: Attending to Products in Time Series Forecasting [5.459207333107234]
We develop a new Transformer-based forecasting approach using a shared, multi-task per-time series network.
We provide a case study applying our approach to successfully improve demand prediction for a medical device manufacturing company.
arXiv Detail & Related papers (2024-08-07T16:22:21Z) - Large Scale Hierarchical Industrial Demand Time-Series Forecasting incorporating Sparsity [16.609280485541323]
We propose HAILS, a novel probabilistic hierarchical model that enables accurate and calibrated probabilistic forecasts across the hierarchy.
We deploy HAILS at a large chemical manufacturing company for a product demand forecasting application with over ten thousand products and observe a significant 8.5% improvement in forecast accuracy and 23% better improvement for sparse time-series.
arXiv Detail & Related papers (2024-07-02T20:40:08Z) - F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - Leveraging World Events to Predict E-Commerce Consumer Demand under Anomaly [32.54836258878438]
Time series sales forecasting for e-commerce is difficult during periods with many anomalies.
We propose a novel methodology based on transformers to construct an embedding of a day based on the relations of the day's events.
We empirically evaluate the methods over a large e-commerce products sales dataset, extracted from eBay.
arXiv Detail & Related papers (2024-05-22T21:05:35Z) - Performative Time-Series Forecasting [64.03865043422597]
We formalize performative time-series forecasting (PeTS) from a machine-learning perspective.<n>We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts.<n>We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks.
arXiv Detail & Related papers (2023-10-09T18:34:29Z) - Improved Sales Forecasting using Trend and Seasonality Decomposition
with LightGBM [9.788039182463768]
We propose a new measure to indicate the unique impacts of the trend and seasonality components on a time series.
Our experiments show that the proposed strategy can achieve improved accuracy.
arXiv Detail & Related papers (2023-05-26T18:49:42Z) - Approaching sales forecasting using recurrent neural networks and
transformers [57.43518732385863]
We develop three alternatives to tackle the problem of forecasting the customer sales at day/store/item level using deep learning techniques.
Our empirical results show how good performance can be achieved by using a simple sequence to sequence architecture with minimal data preprocessing effort.
The proposed solution achieves a RMSLE of around 0.54, which is competitive with other more specific solutions to the problem proposed in the Kaggle competition.
arXiv Detail & Related papers (2022-04-16T12:03:52Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.