Are Transformers Effective for Time Series Forecasting?
- URL: http://arxiv.org/abs/2205.13504v1
- Date: Thu, 26 May 2022 17:17:08 GMT
- Title: Are Transformers Effective for Time Series Forecasting?
- Authors: Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu
- Abstract summary: Recently, there has been a surge of Transformer-based solutions for the time series forecasting (TSF) task.
This study investigates whether Transformer-based techniques are the right solutions for long-term time series forecasting.
We find that the relatively higher long-term forecasting accuracy of Transformer-based solutions has little to do with the temporal relation extraction capabilities of the Transformer architecture.
- Score: 13.268196448051308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, there has been a surge of Transformer-based solutions for the time
series forecasting (TSF) task, especially for the challenging long-term TSF
problem. Transformer architecture relies on self-attention mechanisms to
effectively extract the semantic correlations between paired elements in a long
sequence, which is permutation-invariant and anti-ordering to some extent.
However, in time series modeling, we are to extract the temporal relations
among an ordering set of continuous points. Consequently, whether
Transformer-based techniques are the right solutions for long-term time series
forecasting is an interesting problem to investigate, despite the performance
improvements shown in these studies. In this work, we question the validity of
Transformer-based TSF solutions. In their experiments, the compared
(non-Transformer) baselines are mainly autoregressive forecasting solutions,
which usually have a poor long-term prediction capability due to inevitable
error accumulation effects. In contrast, we use an embarrassingly simple
architecture named DLinear that conducts direct multi-step (DMS) forecasting
for comparison. DLinear decomposes the time series into a trend and a remainder
series and employs two one-layer linear networks to model these two series for
the forecasting task. Surprisingly, it outperforms existing complex
Transformer-based models in most cases by a large margin. Therefore, we
conclude that the relatively higher long-term forecasting accuracy of
Transformer-based TSF solutions shown in existing works has little to do with
the temporal relation extraction capabilities of the Transformer architecture.
Instead, it is mainly due to the non-autoregressive DMS forecasting strategy
used in them. We hope this study also advocates revisiting the validity of
Transformer-based solutions for other time series analysis tasks (e.g., anomaly
detection) in the future.
Related papers
- LSEAttention is All You Need for Time Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision.
I introduce textbfLSEAttention, an approach designed to address entropy collapse and training instability commonly observed in transformer models.
arXiv Detail & Related papers (2024-10-31T09:09:39Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - Stecformer: Spatio-temporal Encoding Cascaded Transformer for
Multivariate Long-term Time Series Forecasting [11.021398675773055]
We propose a complete solution to address problems in terms of feature extraction and target prediction.
For extraction, we design an efficient-temporal encoding extractor including a semi-adaptive graph to acquire sufficient-temporal information.
For prediction, we propose a Cascaded De Predictor (CDP) to strengthen the correlation between different intervals.
arXiv Detail & Related papers (2023-05-25T13:00:46Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Causal Transformer for Estimating Counterfactual Outcomes [18.640006398066188]
Estimating counterfactual outcomes over time from observational data is relevant for many applications.
We develop a novel Causal Transformer for estimating counterfactual outcomes over time.
Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders.
arXiv Detail & Related papers (2022-04-14T22:40:09Z) - Transformers in Time Series: A Survey [66.50847574634726]
We systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations.
From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers.
From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification.
arXiv Detail & Related papers (2022-02-15T01:43:27Z) - ETSformer: Exponential Smoothing Transformers for Time-series
Forecasting [35.76867542099019]
We propose ETSFormer, a novel time-series Transformer architecture, which exploits the principle of exponential smoothing in improving Transformers for time-series forecasting.
In particular, inspired by the classical exponential smoothing methods in time-series forecasting, we propose the novel exponential smoothing attention (ESA) and frequency attention (FA) to replace the self-attention mechanism in vanilla Transformers, thus improving both accuracy and efficiency.
arXiv Detail & Related papers (2022-02-03T02:50:44Z) - FEDformer: Frequency Enhanced Decomposed Transformer for Long-term
Series Forecasting [23.199388386249215]
We propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series.
We exploit the fact that most time series tend to have a sparse representation in well-known basis such as Fourier transform.
Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer (bf FEDformer), is more efficient than standard Transformer.
arXiv Detail & Related papers (2022-01-30T06:24:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.