ChunkFormer: Learning Long Time Series with Multi-stage Chunked
Transformer
- URL: http://arxiv.org/abs/2112.15087v1
- Date: Thu, 30 Dec 2021 15:06:32 GMT
- Title: ChunkFormer: Learning Long Time Series with Multi-stage Chunked
Transformer
- Authors: Yue Ju, Alka Isac and Yimin Nie
- Abstract summary: Original Transformer-based models adopt an attention mechanism to discover global information along a sequence.
ChunkFormer splits the long sequences into smaller sequence chunks for the attention calculation.
In this way, the proposed model gradually learns both local and global information without changing the total length of the input sequences.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The analysis of long sequence data remains challenging in many real-world
applications. We propose a novel architecture, ChunkFormer, that improves the
existing Transformer framework to handle the challenges while dealing with long
time series. Original Transformer-based models adopt an attention mechanism to
discover global information along a sequence to leverage the contextual data.
Long sequential data traps local information such as seasonality and
fluctuations in short data sequences. In addition, the original Transformer
consumes more resources by carrying the entire attention matrix during the
training course. To overcome these challenges, ChunkFormer splits the long
sequences into smaller sequence chunks for the attention calculation,
progressively applying different chunk sizes in each stage. In this way, the
proposed model gradually learns both local and global information without
changing the total length of the input sequences. We have extensively tested
the effectiveness of this new architecture on different business domains and
have proved the advantage of such a model over the existing Transformer-based
models.
Related papers
- sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting [6.434378359932152]
We review and categorize existing Transformer-based models into two main types: (1) modifications to the model structure and (2) modifications to the input data.
We propose $textbfsTransformer$, which introduces the Sequence and Temporal Convolutional Network (STCN) to fully capture both sequential and temporal information.
We compare our model with linear models and existing forecasting models on long-term time-series forecasting, achieving new state-of-the-art results.
arXiv Detail & Related papers (2024-08-19T06:23:41Z) - Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures [46.58170057001437]
We introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences.
We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts.
arXiv Detail & Related papers (2024-05-31T14:00:44Z) - Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers [55.475142494272724]
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains.
We introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions.
The model consistently delivers state-of-the-art performance across various real-world datasets.
arXiv Detail & Related papers (2024-05-22T16:41:21Z) - Rough Transformers for Continuous and Efficient Time-Series Modelling [46.58170057001437]
Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals.
We introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences.
We find that Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the benefits of Neural ODE-based models.
arXiv Detail & Related papers (2024-03-15T13:29:45Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - Client: Cross-variable Linear Integrated Enhanced Transformer for
Multivariate Long-Term Time Series Forecasting [4.004869317957185]
"Cross-variable Linear Integrated ENhanced Transformer for Multivariable Long-Term Time Series Forecasting" (Client) is an advanced model that outperforms both traditional Transformer-based models and linear models.
Client incorporates non-linearity and cross-variable dependencies, which sets it apart from conventional linear models and Transformer-based models.
arXiv Detail & Related papers (2023-05-30T08:31:22Z) - Infomaxformer: Maximum Entropy Transformer for Long Time-Series
Forecasting Problem [6.497816402045097]
The Transformer architecture yields state-of-the-art results in many tasks such as natural language processing (NLP) and computer vision (CV)
With this advanced capability, however, the quadratic time complexity and high memory usage prevents the Transformer from dealing with long time-series forecasting problem.
We propose a method that combines the encoder-decoder architecture with seasonal-trend decomposition to capture more specific seasonal parts.
arXiv Detail & Related papers (2023-01-04T14:08:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.