GBT: Two-stage transformer framework for non-stationary time series
forecasting
- URL: http://arxiv.org/abs/2307.08302v1
- Date: Mon, 17 Jul 2023 07:55:21 GMT
- Title: GBT: Two-stage transformer framework for non-stationary time series
forecasting
- Authors: Li Shen, Yuning Wei, Yangzhu Wang
- Abstract summary: We propose GBT, a novel two-stage Transformer framework with Good Beginning.
It decouples the prediction process of TSFT into two stages, including Auto-Regression stage and Self-Regression stage.
Experiments on seven benchmark datasets demonstrate that GBT outperforms SOTA TSFTs with only canonical attention and convolution.
- Score: 3.830797055092574
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper shows that time series forecasting Transformer (TSFT) suffers from
severe over-fitting problem caused by improper initialization method of unknown
decoder inputs, esp. when handling non-stationary time series. Based on this
observation, we propose GBT, a novel two-stage Transformer framework with Good
Beginning. It decouples the prediction process of TSFT into two stages,
including Auto-Regression stage and Self-Regression stage to tackle the problem
of different statistical properties between input and prediction
sequences.Prediction results of Auto-Regression stage serve as a Good
Beginning, i.e., a better initialization for inputs of Self-Regression stage.
We also propose Error Score Modification module to further enhance the
forecasting capability of the Self-Regression stage in GBT. Extensive
experiments on seven benchmark datasets demonstrate that GBT outperforms SOTA
TSFTs (FEDformer, Pyraformer, ETSformer, etc.) and many other forecasting
models (SCINet, N-HiTS, etc.) with only canonical attention and convolution
while owning less time and space complexity. It is also general enough to
couple with these models to strengthen their forecasting capability. The source
code is available at: https://github.com/OrigamiSL/GBT
Related papers
- Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - A Time Series is Worth 64 Words: Long-term Forecasting with Transformers [4.635547236305835]
We propose an efficient design of Transformer-based models for time series forecasting and self-supervised representation learning.
It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer.
PatchTST can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models.
arXiv Detail & Related papers (2022-11-27T05:15:42Z) - Transformers meet Stochastic Block Models: Attention with Data-Adaptive
Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention.
We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model.
Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z) - Non-stationary Transformers: Exploring the Stationarity in Time Series
Forecasting [86.33543833145457]
We propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention.
Our framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer.
arXiv Detail & Related papers (2022-05-28T12:27:27Z) - Are Transformers Effective for Time Series Forecasting? [13.268196448051308]
Recently, there has been a surge of Transformer-based solutions for the time series forecasting (TSF) task.
This study investigates whether Transformer-based techniques are the right solutions for long-term time series forecasting.
We find that the relatively higher long-term forecasting accuracy of Transformer-based solutions has little to do with the temporal relation extraction capabilities of the Transformer architecture.
arXiv Detail & Related papers (2022-05-26T17:17:08Z) - TCCT: Tightly-Coupled Convolutional Transformer on Time Series
Forecasting [6.393659160890665]
We propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures.
Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models.
arXiv Detail & Related papers (2021-08-29T08:49:31Z) - Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition [66.47000813920617]
We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition.
The proposed model can accurately predict the length of the target sequence and achieve a competitive performance.
The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
arXiv Detail & Related papers (2020-05-16T08:27:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.