CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting
- URL: http://arxiv.org/abs/2305.12095v5
- Date: Fri, 16 Feb 2024 02:23:12 GMT
- Title: CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting
- Authors: Wang Xue, Tian Zhou, Qingsong Wen, Jinyang Gao, Bolin Ding, Rong Jin
- Abstract summary: We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
- Score: 50.23240107430597
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent studies have demonstrated the great power of Transformer models for
time series forecasting. One of the key elements that lead to the transformer's
success is the channel-independent (CI) strategy to improve the training
robustness. However, the ignorance of the correlation among different channels
in CI would limit the model's forecasting capacity. In this work, we design a
special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for
short), that addresses key shortcomings of CI type Transformer in time series
forecasting. First, CARD introduces a channel-aligned attention structure that
allows it to capture both temporal correlations among signals and dynamical
dependence among multiple variables over time. Second, in order to efficiently
utilize the multi-scale knowledge, we design a token blend module to generate
tokens with different resolutions. Third, we introduce a robust loss function
for time series forecasting to alleviate the potential overfitting issue. This
new loss function weights the importance of forecasting over a finite horizon
based on prediction uncertainties. Our evaluation of multiple long-term and
short-term forecasting datasets demonstrates that CARD significantly
outperforms state-of-the-art time series forecasting methods. The code is
available at the following repository:https://github.com/wxie9/CARD
Related papers
- Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - Stecformer: Spatio-temporal Encoding Cascaded Transformer for
Multivariate Long-term Time Series Forecasting [11.021398675773055]
We propose a complete solution to address problems in terms of feature extraction and target prediction.
For extraction, we design an efficient-temporal encoding extractor including a semi-adaptive graph to acquire sufficient-temporal information.
For prediction, we propose a Cascaded De Predictor (CDP) to strengthen the correlation between different intervals.
arXiv Detail & Related papers (2023-05-25T13:00:46Z) - The Capacity and Robustness Trade-off: Revisiting the Channel
Independent Strategy for Multivariate Time Series Forecasting [50.48888534815361]
We show that models trained with the Channel Independent (CI) strategy outperform those trained with the Channel Dependent (CD) strategy.
Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series.
We propose a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy.
arXiv Detail & Related papers (2023-04-11T13:15:33Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - A Time Series is Worth 64 Words: Long-term Forecasting with Transformers [4.635547236305835]
We propose an efficient design of Transformer-based models for time series forecasting and self-supervised representation learning.
It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer.
PatchTST can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models.
arXiv Detail & Related papers (2022-11-27T05:15:42Z) - Are Transformers Effective for Time Series Forecasting? [13.268196448051308]
Recently, there has been a surge of Transformer-based solutions for the time series forecasting (TSF) task.
This study investigates whether Transformer-based techniques are the right solutions for long-term time series forecasting.
We find that the relatively higher long-term forecasting accuracy of Transformer-based solutions has little to do with the temporal relation extraction capabilities of the Transformer architecture.
arXiv Detail & Related papers (2022-05-26T17:17:08Z) - A Differential Attention Fusion Model Based on Transformer for Time
Series Forecasting [4.666618110838523]
Time series forecasting is widely used in the fields of equipment life cycle forecasting, weather forecasting, traffic flow forecasting, and other fields.
Some scholars have tried to apply Transformer to time series forecasting because of its powerful parallel training ability.
The existing Transformer methods do not pay enough attention to the small time segments that play a decisive role in prediction.
arXiv Detail & Related papers (2022-02-23T10:33:12Z) - Informer: Beyond Efficient Transformer for Long Sequence Time-Series
Forecasting [25.417560221400347]
Long sequence time-series forecasting (LSTF) demands a high prediction capacity.
Recent studies have shown the potential of Transformer to increase the prediction capacity.
We design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics.
arXiv Detail & Related papers (2020-12-14T11:43:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.