U-shaped Transformer: Retain High Frequency Context in Time Series
Analysis
- URL: http://arxiv.org/abs/2307.09019v2
- Date: Sun, 8 Oct 2023 07:41:58 GMT
- Title: U-shaped Transformer: Retain High Frequency Context in Time Series
Analysis
- Authors: Qingkui Chen, Yiqin Zhang
- Abstract summary: In this paper, we consider the low-pass characteristics of transformers and try to incorporate the advantages of them.
We introduce patch merge and split operation to extract features with different scales and use larger datasets to fully make use of the transformer backbone.
Our experiments demonstrate that the model performs at an advanced level across multiple datasets with relatively low cost.
- Score: 0.5710971447109949
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series prediction plays a crucial role in various industrial fields. In
recent years, neural networks with a transformer backbone have achieved
remarkable success in many domains, including computer vision and NLP. In time
series analysis domain, some studies have suggested that even the simplest MLP
networks outperform advanced transformer-based networks on time series forecast
tasks. However, we believe these findings indicate there to be low-rank
properties in time series sequences. In this paper, we consider the low-pass
characteristics of transformers and try to incorporate the advantages of MLP.
We adopt skip-layer connections inspired by Unet into traditional transformer
backbone, thus preserving high-frequency context from input to output, namely
U-shaped Transformer. We introduce patch merge and split operation to extract
features with different scales and use larger datasets to fully make use of the
transformer backbone. Our experiments demonstrate that the model performs at an
advanced level across multiple datasets with relatively low cost.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization
for Enhanced Time Series Forecasting [28.646457377816795]
We introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ)
Our methodology capitalizes on a sparse vector quantization technique coupled with Reverse Instance Normalization (RevIN) to reduce noise impact.
Our FFN-free approach trims the parameter count, enhancing computational efficiency and reducing overfitting.
arXiv Detail & Related papers (2024-02-08T17:09:12Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - W-Transformers : A Wavelet-based Transformer Framework for Univariate
Time Series Forecasting [7.075125892721573]
We build a transformer model for non-stationary time series using wavelet-based transformer encoder architecture.
We evaluate our framework on several publicly available benchmark time series datasets from various domains.
arXiv Detail & Related papers (2022-09-08T17:39:38Z) - CLMFormer: Mitigating Data Redundancy to Revitalize Transformer-based
Long-Term Time Series Forecasting System [46.39662315849883]
Long-term time-series forecasting (LTSF) plays a crucial role in various practical applications.
Existing Transformer-based models, such as Fedformer and Informer, often achieve their best performances on validation sets after just a few epochs.
We propose a novel approach to address this issue by employing curriculum learning and introducing a memory-driven decoder.
arXiv Detail & Related papers (2022-07-16T04:05:15Z) - Transformers in Time Series: A Survey [66.50847574634726]
We systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations.
From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers.
From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification.
arXiv Detail & Related papers (2022-02-15T01:43:27Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.