Non-stationary Transformers: Exploring the Stationarity in Time Series
Forecasting
- URL: http://arxiv.org/abs/2205.14415v4
- Date: Fri, 24 Nov 2023 09:01:12 GMT
- Title: Non-stationary Transformers: Exploring the Stationarity in Time Series
Forecasting
- Authors: Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long
- Abstract summary: We propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention.
Our framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer.
- Score: 86.33543833145457
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Transformers have shown great power in time series forecasting due to their
global-range modeling ability. However, their performance can degenerate
terribly on non-stationary real-world data in which the joint distribution
changes over time. Previous studies primarily adopt stationarization to
attenuate the non-stationarity of original series for better predictability.
But the stationarized series deprived of inherent non-stationarity can be less
instructive for real-world bursty events forecasting. This problem, termed
over-stationarization in this paper, leads Transformers to generate
indistinguishable temporal attentions for different series and impedes the
predictive capability of deep models. To tackle the dilemma between series
predictability and model capability, we propose Non-stationary Transformers as
a generic framework with two interdependent modules: Series Stationarization
and De-stationary Attention. Concretely, Series Stationarization unifies the
statistics of each input and converts the output with restored statistics for
better predictability. To address the over-stationarization problem,
De-stationary Attention is devised to recover the intrinsic non-stationary
information into temporal dependencies by approximating distinguishable
attentions learned from raw series. Our Non-stationary Transformers framework
consistently boosts mainstream Transformers by a large margin, which reduces
MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer,
making them the state-of-the-art in time series forecasting. Code is available
at this repository: https://github.com/thuml/Nonstationary_Transformers.
Related papers
- Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.
Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.
Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Considering Nonstationary within Multivariate Time Series with
Variational Hierarchical Transformer for Forecasting [12.793705636683402]
We develop a powerful hierarchical probabilistic generative module to consider the non-stationarity and intrinsic characteristics within MTS.
We then combine it with transformer for a well-defined variational generative dynamic model named Hierarchical Time series Variational Transformer (HTV-Trans)
Being a powerful probabilistic model, HTV-Trans is utilized to learn expressive representations of MTS and applied to forecasting tasks.
arXiv Detail & Related papers (2024-03-08T16:04:36Z) - Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications.
Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role.
Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z) - Transformers with Attentive Federated Aggregation for Time Series Stock
Forecasting [15.968396756506236]
Time series modeling has led to the widespread use of transformers in many time series applications.
The adaptation of transformers to time series forecasting has remained limited, with both promising and inconsistent results.
We propose attentive federated transformers for time series stock forecasting with better performance while preserving the privacy of participating enterprises.
arXiv Detail & Related papers (2024-01-22T07:33:28Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - GBT: Two-stage transformer framework for non-stationary time series
forecasting [3.830797055092574]
We propose GBT, a novel two-stage Transformer framework with Good Beginning.
It decouples the prediction process of TSFT into two stages, including Auto-Regression stage and Self-Regression stage.
Experiments on seven benchmark datasets demonstrate that GBT outperforms SOTA TSFTs with only canonical attention and convolution.
arXiv Detail & Related papers (2023-07-17T07:55:21Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series
Forecasting [24.510978166050293]
This work is the first attempt to propose a Non-Autoregressive Transformer architecture for time series forecasting.
We present a novel spatial-temporal attention mechanism, building a bridge by a learned temporal influence map to fill the gaps between the spatial and temporal attention.
arXiv Detail & Related papers (2021-02-10T18:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.