Related papers: Autoregressive Moving-average Attention Mechanism for Time Series Forecasting

Autoregressive Moving-average Attention Mechanism for Time Series Forecasting

URL: http://arxiv.org/abs/2410.03159v1
Date: Fri, 4 Oct 2024 05:45:50 GMT
Title: Autoregressive Moving-average Attention Mechanism for Time Series Forecasting
Authors: Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang,
Abstract summary: We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines.
Score: 9.114664059026767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms, enhancing their ability to capture long-range and local temporal patterns in time series. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that incorporating the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.

Related papers

Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting [0.9361474110798144]
We show that a single linear attention layer can be interpreted as a dynamic vector autoregressive ( VAR) structure. By rearranging the generalization, attention, and input-output flow, multi-layer linear attention can also be aligned as a VAR model. We propose Structural Aligned Mixture of VAR, a linear Transformer variant that integrates interpretable dynamic VAR weights for multivariate TSF.
arXiv Detail & Related papers (2025-02-11T04:24:43Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
Enabling Autoregressive Models to Fill In Masked Tokens [50.9948753314669]
This work introduces MARIA (Masked and Autoregressive Infilling Architecture), a novel approach that achieves state-of-the-art masked infilling performance. MARIA combines a pre-trained and AR model by training a linear decoder that takes their hidden states as input. Our results demonstrate that MARIA significantly outperforms existing methods, namely discrete diffusion models, on masked infilling tasks.
arXiv Detail & Related papers (2025-02-09T20:02:05Z)
MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction [0.0]
We introduce the Multi-Attention Unit (MAUCell) which combines Generative Adrative Networks (GANs) and attention mechanisms to improve video prediction. The new design system maintains equilibrium between temporal continuity and spatial accuracy to deliver reliable video prediction.
arXiv Detail & Related papers (2025-01-28T14:52:10Z)
Transformer-Based Bearing Fault Detection using Temporal Decomposition Attention Mechanism [0.40964539027092917]
Bearing fault detection is a critical task in predictive maintenance, where accurate and timely fault identification can prevent costly downtime and equipment damage. Traditional attention mechanisms in Transformer neural networks often struggle to capture the complex temporal patterns in bearing vibration data, leading to suboptimal performance. We propose a novel attention mechanism, Temporal Decomposition Attention (TDA), which combines temporal bias encoding with seasonal-trend decomposition to capture both long-term dependencies and periodic fluctuations in time series data.
arXiv Detail & Related papers (2024-12-15T16:51:31Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Diffusion Auto-regressive Transformer for Effective Self-supervised Time Series Forecasting [47.58016750718323]
We propose a novel generative self-supervised method called TimeDART. TimeDART captures both the global sequence dependence and local detail features within time series data. Our code is publicly available at https://github.com/Melmaphother/TimeDART.
arXiv Detail & Related papers (2024-10-08T06:08:33Z)
Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting [8.841114905151152]
Local Attention Mechanism (LAM) is an efficient attention mechanism tailored for time series analysis. LAM exploits the continuity properties of time series to reduce the number of attention scores computed. We present an algorithm for implementing LAM in algebra tensor that runs in time and memory O(nlogn)
arXiv Detail & Related papers (2024-10-04T11:32:02Z)
Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting [5.354055742467354]
Low-Rank Adaptation (LoRA) is a technique for fine-tuning large pre-trained or foundational models across different modalities and tasks. This paper examines the impact of LoRA on contemporary time series foundational models: Lag-Llama, MOIRAI, and Chronos.
arXiv Detail & Related papers (2024-05-16T16:05:33Z)
Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z)
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models. AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z)
Enhanced LFTSformer: A Novel Long-Term Financial Time Series Prediction Model Using Advanced Feature Engineering and the DS Encoder Informer Architecture [0.8532753451809455]
This study presents a groundbreaking model for forecasting long-term financial time series, termed the Enhanced LFTSformer. The model distinguishes itself through several significant innovations. Systematic experimentation on a range of benchmark stock market datasets demonstrates that the Enhanced LFTSformer outperforms traditional machine learning models.
arXiv Detail & Related papers (2023-10-03T08:37:21Z)
Perceiver-based CDF Modeling for Time Series Forecasting [25.26713741799865]
We propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data. Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction. Experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods.
arXiv Detail & Related papers (2023-10-03T01:13:17Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
An Attention Free Long Short-Term Memory for Time Series Forecasting [0.0]
We focused on time series forecasting using attention free mechanism, a more efficient framework, and proposed a new architecture for time series prediction. We proposed an architecture built using attention free LSTM layers that overcome linear models for conditional variance prediction.
arXiv Detail & Related papers (2022-09-20T08:23:49Z)
Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models. E-ARM takes advantage of a well-designed energy-based learning objective. We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z)
Stochastically forced ensemble dynamic mode decomposition for forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system. We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony. Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.