A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
- URL: http://arxiv.org/abs/2211.14730v1
- Date: Sun, 27 Nov 2022 05:15:42 GMT
- Title: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
- Authors: Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam
- Abstract summary: We propose an efficient design of Transformer-based models for time series forecasting and self-supervised representation learning.
It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer.
PatchTST can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models.
- Score: 4.635547236305835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an efficient design of Transformer-based models for multivariate
time series forecasting and self-supervised representation learning. It is
based on two key components: (i) segmentation of time series into
subseries-level patches which are served as input tokens to Transformer; (ii)
channel-independence where each channel contains a single univariate time
series that shares the same embedding and Transformer weights across all the
series. Patching design naturally has three-fold benefit: local semantic
information is retained in the embedding; computation and memory usage of the
attention maps are quadratically reduced given the same look-back window; and
the model can attend longer history. Our channel-independent patch time series
Transformer (PatchTST) can improve the long-term forecasting accuracy
significantly when compared with that of SOTA Transformer-based models. We also
apply our model to self-supervised pre-training tasks and attain excellent
fine-tuning performance, which outperforms supervised training on large
datasets. Transferring of masked pre-trained representation on one dataset to
others also produces SOTA forecasting accuracy. Code is available at:
https://github.com/yuqinie98/PatchTST.
Related papers
- Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers [55.475142494272724]
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains.
We introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions.
The model consistently delivers state-of-the-art performance across various real-world datasets.
arXiv Detail & Related papers (2024-05-22T16:41:21Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - DuETT: Dual Event Time Transformer for Electronic Health Records [14.520791492631114]
We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions.
DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length.
Our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.
arXiv Detail & Related papers (2023-04-25T17:47:48Z) - W-Transformers : A Wavelet-based Transformer Framework for Univariate
Time Series Forecasting [7.075125892721573]
We build a transformer model for non-stationary time series using wavelet-based transformer encoder architecture.
We evaluate our framework on several publicly available benchmark time series datasets from various domains.
arXiv Detail & Related papers (2022-09-08T17:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.