Autoregressive Moving-average Attention Mechanism for Time Series Forecasting
- URL: http://arxiv.org/abs/2410.03159v1
- Date: Fri, 4 Oct 2024 05:45:50 GMT
- Title: Autoregressive Moving-average Attention Mechanism for Time Series Forecasting
- Authors: Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang,
- Abstract summary: We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms.
In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines.
- Score: 9.114664059026767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms, enhancing their ability to capture long-range and local temporal patterns in time series. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that incorporating the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.
Related papers
- Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Diffusion Auto-regressive Transformer for Effective Self-supervised Time Series Forecasting [47.58016750718323]
We propose a novel generative self-supervised method called TimeDART.
TimeDART captures both the global sequence dependence and local detail features within time series data.
Our code is publicly available at https://github.com/Melmaphother/TimeDART.
arXiv Detail & Related papers (2024-10-08T06:08:33Z) - Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting [8.841114905151152]
Local Attention Mechanism (LAM) is an efficient attention mechanism tailored for time series analysis.
LAM exploits the continuity properties of time series to reduce the number of attention scores computed.
We present an algorithm for implementing LAM in algebra tensor that runs in time and memory O(nlogn)
arXiv Detail & Related papers (2024-10-04T11:32:02Z) - Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting [5.354055742467354]
Low-Rank Adaptation (LoRA) is a technique for fine-tuning large pre-trained or foundational models across different modalities and tasks.
This paper examines the impact of LoRA on contemporary time series foundational models: Lag-Llama, MOIRAI, and Chronos.
arXiv Detail & Related papers (2024-05-16T16:05:33Z) - Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications.
Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role.
Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Enhanced LFTSformer: A Novel Long-Term Financial Time Series Prediction Model Using Advanced Feature Engineering and the DS Encoder Informer Architecture [0.8532753451809455]
This study presents a groundbreaking model for forecasting long-term financial time series, termed the Enhanced LFTSformer.
The model distinguishes itself through several significant innovations.
Systematic experimentation on a range of benchmark stock market datasets demonstrates that the Enhanced LFTSformer outperforms traditional machine learning models.
arXiv Detail & Related papers (2023-10-03T08:37:21Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - An Attention Free Long Short-Term Memory for Time Series Forecasting [0.0]
We focused on time series forecasting using attention free mechanism, a more efficient framework, and proposed a new architecture for time series prediction.
We proposed an architecture built using attention free LSTM layers that overcome linear models for conditional variance prediction.
arXiv Detail & Related papers (2022-09-20T08:23:49Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Stochastically forced ensemble dynamic mode decomposition for
forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system.
We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony.
Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.