Are Self-Attentions Effective for Time Series Forecasting?
- URL: http://arxiv.org/abs/2405.16877v1
- Date: Mon, 27 May 2024 06:49:39 GMT
- Title: Are Self-Attentions Effective for Time Series Forecasting?
- Authors: Dongbin Kim, Jinseong Park, Jaewook Lee, Hoki Kim,
- Abstract summary: Time series forecasting is crucial for applications across multiple domains and various scenarios.
Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches.
We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework.
- Score: 4.990206466948269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically shifted the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift focus from the overall architecture of the Transformer to the effectiveness of self-attentions for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter sharing, our model not only improves long-term forecasting accuracy but also reduces the number of parameters and memory usage. Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
Related papers
- LSEAttention is All You Need for Time Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision.
I introduce textbfLSEAttention, an approach designed to address entropy collapse and training instability commonly observed in transformer models.
arXiv Detail & Related papers (2024-10-31T09:09:39Z) - sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting [6.434378359932152]
We review and categorize existing Transformer-based models into two main types: (1) modifications to the model structure and (2) modifications to the input data.
We propose $textbfsTransformer$, which introduces the Sequence and Temporal Convolutional Network (STCN) to fully capture both sequential and temporal information.
We compare our model with linear models and existing forecasting models on long-term time-series forecasting, achieving new state-of-the-art results.
arXiv Detail & Related papers (2024-08-19T06:23:41Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS)
We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z) - Two Steps Forward and One Behind: Rethinking Time Series Forecasting
with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks.
We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting.
We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Persistence Initialization: A novel adaptation of the Transformer
architecture for Time Series Forecasting [0.7734726150561088]
Time series forecasting is an important problem, with many real world applications.
We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting.
We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model.
arXiv Detail & Related papers (2022-08-30T13:04:48Z) - Spatio-Temporal Wind Speed Forecasting using Graph Networks and Novel
Transformer Architectures [1.278093617645299]
This study focuses on multi-step-temporal wind speed forecasting for the Norwegian continental shelf.
A graph neural network (GNN) architecture was used to extract spatial dependencies, with different update functions to learn temporal correlations.
This is the first time the LogSparse Transformer and Autoformer have been applied to wind forecasting.
arXiv Detail & Related papers (2022-08-29T13:26:20Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.