A K-variate Time Series Is Worth K Words: Evolution of the Vanilla
Transformer Architecture for Long-term Multivariate Time Series Forecasting
- URL: http://arxiv.org/abs/2212.02789v1
- Date: Tue, 6 Dec 2022 07:00:31 GMT
- Title: A K-variate Time Series Is Worth K Words: Evolution of the Vanilla
Transformer Architecture for Long-term Multivariate Time Series Forecasting
- Authors: Zanwei Zhou, Ruizhe Zhong, Chen Yang, Yan Wang, Xiaokang Yang, Wei
Shen
- Abstract summary: Transformer has become the de facto solution for MTSF, especially for the long-term cases.
In this study, we point out that the current tokenization strategy in MTSF Transformer architectures ignores the token inductive bias of Transformers.
We make a series of evolution on the basic architecture of the vanilla MTSF transformer.
Surprisingly, the evolved simple transformer architecture is highly effective, which successfully avoids the over-smoothing phenomena in the vanilla MTSF transformer.
- Score: 52.33042819442005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multivariate time series forecasting (MTSF) is a fundamental problem in
numerous real-world applications. Recently, Transformer has become the de facto
solution for MTSF, especially for the long-term cases. However, except for the
one forward operation, the basic configurations in existing MTSF Transformer
architectures were barely carefully verified. In this study, we point out that
the current tokenization strategy in MTSF Transformer architectures ignores the
token uniformity inductive bias of Transformers. Therefore, the vanilla MTSF
transformer struggles to capture details in time series and presents inferior
performance. Based on this observation, we make a series of evolution on the
basic architecture of the vanilla MTSF transformer. We vary the flawed
tokenization strategy, along with the decoder structure and embeddings.
Surprisingly, the evolved simple transformer architecture is highly effective,
which successfully avoids the over-smoothing phenomena in the vanilla MTSF
transformer, achieves a more detailed and accurate prediction, and even
substantially outperforms the state-of-the-art Transformers that are
well-designed for MTSF.
Related papers
- LSEAttention is All You Need for Time Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision.
I introduce textbfLSEAttention, an approach designed to address entropy collapse and training instability commonly observed in transformer models.
arXiv Detail & Related papers (2024-10-31T09:09:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - A Systematic Review for Transformer-based Long-term Series Forecasting [7.414422194379818]
Transformer architecture has proven to be the most successful solution to extract semantic correlations.
Various variants have enabled transformer architecture to handle long-term time series forecasting tasks.
arXiv Detail & Related papers (2023-10-31T06:37:51Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - W-Transformers : A Wavelet-based Transformer Framework for Univariate
Time Series Forecasting [7.075125892721573]
We build a transformer model for non-stationary time series using wavelet-based transformer encoder architecture.
We evaluate our framework on several publicly available benchmark time series datasets from various domains.
arXiv Detail & Related papers (2022-09-08T17:39:38Z) - Transformers in Time-series Analysis: A Tutorial [0.0]
Transformer architecture has widespread applications, particularly in Natural Language Processing and computer vision.
This tutorial provides an overview of the Transformer architecture, its applications, and a collection of examples from recent research papers in time-series analysis.
arXiv Detail & Related papers (2022-04-28T05:17:45Z) - Transformers in Time Series: A Survey [66.50847574634726]
We systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations.
From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers.
From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification.
arXiv Detail & Related papers (2022-02-15T01:43:27Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.