Are Self-Attentions Effective for Time Series Forecasting?
- URL: http://arxiv.org/abs/2405.16877v3
- Date: Mon, 23 Dec 2024 13:34:55 GMT
- Title: Are Self-Attentions Effective for Time Series Forecasting?
- Authors: Dongbin Kim, Jinseong Park, Jaewook Lee, Hoki Kim,
- Abstract summary: Time series forecasting is crucial for applications across multiple domains and various scenarios.
Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches.
We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS)
Our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
- Score: 4.990206466948269
- License:
- Abstract: Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically advanced the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift the focus from evaluating the overall Transformer architecture to specifically examining the effectiveness of self-attention for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter sharing, our model not only improves long-term forecasting accuracy but also reduces the number of parameters and memory usage. Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models. The implementation of our model is available at: https://github.com/dongbeank/CATS.
Related papers
- Bridging Simplicity and Sophistication using GLinear: A Novel Architecture for Enhanced Time Series Prediction [7.887606414896063]
Time Series Forecasting (TSF) is an important application across many fields.
Recent research suggests simpler linear models might outperform or at least provide competitive performance compared to complex Transformer-based models for TSF tasks.
We propose a novel data-efficient architecture, GLinear, for multivariate TSF that exploits periodic patterns to provide better accuracy.
arXiv Detail & Related papers (2025-01-02T06:19:53Z) - A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting [0.07916635054977067]
Pruning is an established approach to reduce neural network parameter count and save compute.
We study the effects of these pruning strategies on model predictive performance and computational aspects like model size, operations, and inference time.
We demonstrate that even with corresponding hardware and software support, structured pruning is unable to provide significant time savings.
arXiv Detail & Related papers (2024-12-17T13:07:31Z) - LSEAttention is All You Need for Time Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision.
Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain.
We introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting.
arXiv Detail & Related papers (2024-10-31T09:09:39Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - Enhancing Transformer-based models for Long Sequence Time Series Forecasting via Structured Matrix [7.3758245014991255]
Self-attention mechanism as the core component of Transformer-based models exhibits great potential.
We propose a novel architectural framework that enhances Transformer-based models through the integration of Surrogate Attention Blocks (SAB) and Surrogate Feed-Forward Neural Network Blocks (SFB)
The framework reduces both time and space complexity by the replacement of the self-attention and feed-forward layers with SAB and SFB.
arXiv Detail & Related papers (2024-05-21T02:37:47Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS)
We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z) - Two Steps Forward and One Behind: Rethinking Time Series Forecasting
with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks.
We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting.
We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.