WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting
- URL: http://arxiv.org/abs/2410.03159v3
- Date: Wed, 12 Feb 2025 03:55:17 GMT
- Title: WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting
- Authors: Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang,
- Abstract summary: We propose a weighted Autoregressive Varying gatE attention mechanism equipped with both Autoregressive (AR) and Moving-average (MA) components.
It can adapt to various attention mechanisms, enhancing and decoupling their ability to capture long-range and local temporal patterns in time series data.
- Score: 9.114664059026767
- License:
- Abstract: We propose a Weighted Autoregressive Varying gatE (WAVE) attention mechanism equipped with both Autoregressive (AR) and Moving-average (MA) components. It can adapt to various attention mechanisms, enhancing and decoupling their ability to capture long-range and local temporal patterns in time series data. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that WAVE attention that incorporates the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.
Related papers
- Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting [0.9361474110798144]
We show that a single linear attention layer can be interpreted as a dynamic vector autoregressive ( VAR) structure.
By rearranging the generalization, attention, and input-output flow, multi-layer linear attention can also be aligned as a VAR model.
We propose Structural Aligned Mixture of VAR, a linear Transformer variant that integrates interpretable dynamic VAR weights for multivariate TSF.
arXiv Detail & Related papers (2025-02-11T04:24:43Z) - Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.
Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.
Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z) - Enabling Autoregressive Models to Fill In Masked Tokens [50.9948753314669]
This work introduces MARIA (Masked and Autoregressive Infilling Architecture), a novel approach that achieves state-of-the-art masked infilling performance.
MARIA combines a pre-trained and AR model by training a linear decoder that takes their hidden states as input.
Our results demonstrate that MARIA significantly outperforms existing methods, namely discrete diffusion models, on masked infilling tasks.
arXiv Detail & Related papers (2025-02-09T20:02:05Z) - MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction [0.0]
We introduce the Multi-Attention Unit (MAUCell) which combines Generative Adrative Networks (GANs) and attention mechanisms to improve video prediction.
The new design system maintains equilibrium between temporal continuity and spatial accuracy to deliver reliable video prediction.
arXiv Detail & Related papers (2025-01-28T14:52:10Z) - Transformer-Based Bearing Fault Detection using Temporal Decomposition Attention Mechanism [0.40964539027092917]
Bearing fault detection is a critical task in predictive maintenance, where accurate and timely fault identification can prevent costly downtime and equipment damage.
Traditional attention mechanisms in Transformer neural networks often struggle to capture the complex temporal patterns in bearing vibration data, leading to suboptimal performance.
We propose a novel attention mechanism, Temporal Decomposition Attention (TDA), which combines temporal bias encoding with seasonal-trend decomposition to capture both long-term dependencies and periodic fluctuations in time series data.
arXiv Detail & Related papers (2024-12-15T16:51:31Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications.
Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role.
Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z) - Perceiver-based CDF Modeling for Time Series Forecasting [25.26713741799865]
We propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data.
Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction.
Experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods.
arXiv Detail & Related papers (2023-10-03T01:13:17Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Stochastically forced ensemble dynamic mode decomposition for
forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system.
We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony.
Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.