Related papers: Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

URL: http://arxiv.org/abs/2205.14415v4
Date: Fri, 24 Nov 2023 09:01:12 GMT
Title: Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting
Authors: Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long
Abstract summary: We propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Our framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer.
Score: 86.33543833145457
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Transformers have shown great power in time series forecasting due to their global-range modeling ability. However, their performance can degenerate terribly on non-stationary real-world data in which the joint distribution changes over time. Previous studies primarily adopt stationarization to attenuate the non-stationarity of original series for better predictability. But the stationarized series deprived of inherent non-stationarity can be less instructive for real-world bursty events forecasting. This problem, termed over-stationarization in this paper, leads Transformers to generate indistinguishable temporal attentions for different series and impedes the predictive capability of deep models. To tackle the dilemma between series predictability and model capability, we propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Concretely, Series Stationarization unifies the statistics of each input and converts the output with restored statistics for better predictability. To address the over-stationarization problem, De-stationary Attention is devised to recover the intrinsic non-stationary information into temporal dependencies by approximating distinguishable attentions learned from raw series. Our Non-stationary Transformers framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer, making them the state-of-the-art in time series forecasting. Code is available at this repository: https://github.com/thuml/Nonstationary_Transformers.

Related papers

Lightweight Channel-wise Dynamic Fusion Model: Non-stationary Time Series Forecasting via Entropy Analysis [25.291749176117662]
We show that variance can be a valid and interpretable proxy for non-stationarity of time series. We propose a novel lightweight textitChannel-wise textitDynamic textitFusion textitModel (textitCDFM) Comprehensive experiments on seven time series datasets demonstrate the superiority and generalization capabilities of CDFM.
arXiv Detail & Related papers (2025-03-04T13:29:42Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
LSEAttention is All You Need for Time Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision. I introduce textbfLSEAttention, an approach designed to address entropy collapse and training instability commonly observed in transformer models.
arXiv Detail & Related papers (2024-10-31T09:09:39Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Considering Nonstationary within Multivariate Time Series with Variational Hierarchical Transformer for Forecasting [12.793705636683402]
We develop a powerful hierarchical probabilistic generative module to consider the non-stationarity and intrinsic characteristics within MTS. We then combine it with transformer for a well-defined variational generative dynamic model named Hierarchical Time series Variational Transformer (HTV-Trans) Being a powerful probabilistic model, HTV-Trans is utilized to learn expressive representations of MTS and applied to forecasting tasks.
arXiv Detail & Related papers (2024-03-08T16:04:36Z)
Transformers with Attentive Federated Aggregation for Time Series Stock Forecasting [15.968396756506236]
Time series modeling has led to the widespread use of transformers in many time series applications. The adaptation of transformers to time series forecasting has remained limited, with both promising and inconsistent results. We propose attentive federated transformers for time series stock forecasting with better performance while preserving the privacy of participating enterprises.
arXiv Detail & Related papers (2024-01-22T07:33:28Z)
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions. The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z)
GBT: Two-stage transformer framework for non-stationary time series forecasting [3.830797055092574]
We propose GBT, a novel two-stage Transformer framework with Good Beginning. It decouples the prediction process of TSFT into two stages, including Auto-Regression stage and Self-Regression stage. Experiments on seven benchmark datasets demonstrate that GBT outperforms SOTA TSFTs with only canonical attention and convolution.
arXiv Detail & Related papers (2023-07-17T07:55:21Z)
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting [24.510978166050293]
This work is the first attempt to propose a Non-Autoregressive Transformer architecture for time series forecasting. We present a novel spatial-temporal attention mechanism, building a bridge by a learned temporal influence map to fill the gaps between the spatial and temporal attention.
arXiv Detail & Related papers (2021-02-10T18:36:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.