Related papers: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

URL: http://arxiv.org/abs/2211.14730v1
Date: Sun, 27 Nov 2022 05:15:42 GMT
Title: A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Authors: Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam
Abstract summary: We propose an efficient design of Transformer-based models for time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer. PatchTST can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models.
Score: 4.635547236305835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST.

Related papers

SCFormer: Structured Channel-wise Transformer with Cumulative Historical State for Multivariate Time Series Forecasting [6.186790384993048]
We propose the Structured Channel-wise Transformer with Cumulative Historical state (SCFormer)<n>SCFormer introduces temporal constraints to all linear transformations, including the query, key, and value matrices, as well as the fully connected layers within the Transformer.<n>Experiments on multiple real-world datasets demonstrate that SCFormer significantly outperforms mainstream baselines.
arXiv Detail & Related papers (2025-05-05T13:59:55Z)
Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations [2.2091590689610823]
We re-purpose the Transformer architecture to model both cross-time and cross-variate dependencies. Our method achieves state-of-the-art performance across 13 real-world datasets, delivering performance improvements up to 20.7% over original models.
arXiv Detail & Related papers (2025-05-01T04:59:05Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers [55.475142494272724]
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains. We introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions. The model consistently delivers state-of-the-art performance across various real-world datasets.
arXiv Detail & Related papers (2024-05-22T16:41:21Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions. The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z)
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z)
DuETT: Dual Event Time Transformer for Electronic Health Records [14.520791492631114]
We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions. DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length. Our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.
arXiv Detail & Related papers (2023-04-25T17:47:48Z)
W-Transformers : A Wavelet-based Transformer Framework for Univariate Time Series Forecasting [7.075125892721573]
We build a transformer model for non-stationary time series using wavelet-based transformer encoder architecture. We evaluate our framework on several publicly available benchmark time series datasets from various domains.
arXiv Detail & Related papers (2022-09-08T17:39:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.