Non-stationary Transformers: Exploring the Stationarity in Time Series
Forecasting
- URL: http://arxiv.org/abs/2205.14415v4
- Date: Fri, 24 Nov 2023 09:01:12 GMT
- Title: Non-stationary Transformers: Exploring the Stationarity in Time Series
Forecasting
- Authors: Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long
- Abstract summary: We propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention.
Our framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer.
- Score: 86.33543833145457
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Transformers have shown great power in time series forecasting due to their
global-range modeling ability. However, their performance can degenerate
terribly on non-stationary real-world data in which the joint distribution
changes over time. Previous studies primarily adopt stationarization to
attenuate the non-stationarity of original series for better predictability.
But the stationarized series deprived of inherent non-stationarity can be less
instructive for real-world bursty events forecasting. This problem, termed
over-stationarization in this paper, leads Transformers to generate
indistinguishable temporal attentions for different series and impedes the
predictive capability of deep models. To tackle the dilemma between series
predictability and model capability, we propose Non-stationary Transformers as
a generic framework with two interdependent modules: Series Stationarization
and De-stationary Attention. Concretely, Series Stationarization unifies the
statistics of each input and converts the output with restored statistics for
better predictability. To address the over-stationarization problem,
De-stationary Attention is devised to recover the intrinsic non-stationary
information into temporal dependencies by approximating distinguishable
attentions learned from raw series. Our Non-stationary Transformers framework
consistently boosts mainstream Transformers by a large margin, which reduces
MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer,
making them the state-of-the-art in time series forecasting. Code is available
at this repository: https://github.com/thuml/Nonstationary_Transformers.
Related papers
- LSEAttention is All You Need for Time Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision.
I introduce textbfLSEAttention, an approach designed to address entropy collapse and training instability commonly observed in transformer models.
arXiv Detail & Related papers (2024-10-31T09:09:39Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Considering Nonstationary within Multivariate Time Series with
Variational Hierarchical Transformer for Forecasting [12.793705636683402]
We develop a powerful hierarchical probabilistic generative module to consider the non-stationarity and intrinsic characteristics within MTS.
We then combine it with transformer for a well-defined variational generative dynamic model named Hierarchical Time series Variational Transformer (HTV-Trans)
Being a powerful probabilistic model, HTV-Trans is utilized to learn expressive representations of MTS and applied to forecasting tasks.
arXiv Detail & Related papers (2024-03-08T16:04:36Z) - Transformers with Attentive Federated Aggregation for Time Series Stock
Forecasting [15.968396756506236]
Time series modeling has led to the widespread use of transformers in many time series applications.
The adaptation of transformers to time series forecasting has remained limited, with both promising and inconsistent results.
We propose attentive federated transformers for time series stock forecasting with better performance while preserving the privacy of participating enterprises.
arXiv Detail & Related papers (2024-01-22T07:33:28Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series
Forecasting [24.510978166050293]
This work is the first attempt to propose a Non-Autoregressive Transformer architecture for time series forecasting.
We present a novel spatial-temporal attention mechanism, building a bridge by a learned temporal influence map to fill the gaps between the spatial and temporal attention.
arXiv Detail & Related papers (2021-02-10T18:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.