Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization
for Enhanced Time Series Forecasting
- URL: http://arxiv.org/abs/2402.05830v1
- Date: Thu, 8 Feb 2024 17:09:12 GMT
- Title: Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization
for Enhanced Time Series Forecasting
- Authors: Yanjun Zhao, Tian Zhou, Chao Chen, Liang Sun, Yi Qian, Rong Jin
- Abstract summary: We introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ)
Our methodology capitalizes on a sparse vector quantization technique coupled with Reverse Instance Normalization (RevIN) to reduce noise impact.
Our FFN-free approach trims the parameter count, enhancing computational efficiency and reducing overfitting.
- Score: 28.646457377816795
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Time series analysis is vital for numerous applications, and transformers
have become increasingly prominent in this domain. Leading methods customize
the transformer architecture from NLP and CV, utilizing a patching technique to
convert continuous signals into segments. Yet, time series data are uniquely
challenging due to significant distribution shifts and intrinsic noise levels.
To address these two challenges,we introduce the Sparse Vector Quantized
FFN-Free Transformer (Sparse-VQ). Our methodology capitalizes on a sparse
vector quantization technique coupled with Reverse Instance Normalization
(RevIN) to reduce noise impact and capture sufficient statistics for
forecasting, serving as an alternative to the Feed-Forward layer (FFN) in the
transformer architecture. Our FFN-free approach trims the parameter count,
enhancing computational efficiency and reducing overfitting. Through
evaluations across ten benchmark datasets, including the newly introduced CAISO
dataset, Sparse-VQ surpasses leading models with a 7.84% and 4.17% decrease in
MAE for univariate and multivariate time series forecasting, respectively.
Moreover, it can be seamlessly integrated with existing transformer-based
models to elevate their performance.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - U-shaped Transformer: Retain High Frequency Context in Time Series
Analysis [0.5710971447109949]
In this paper, we consider the low-pass characteristics of transformers and try to incorporate the advantages of them.
We introduce patch merge and split operation to extract features with different scales and use larger datasets to fully make use of the transformer backbone.
Our experiments demonstrate that the model performs at an advanced level across multiple datasets with relatively low cost.
arXiv Detail & Related papers (2023-07-18T07:15:26Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - W-Transformers : A Wavelet-based Transformer Framework for Univariate
Time Series Forecasting [7.075125892721573]
We build a transformer model for non-stationary time series using wavelet-based transformer encoder architecture.
We evaluate our framework on several publicly available benchmark time series datasets from various domains.
arXiv Detail & Related papers (2022-09-08T17:39:38Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - FEDformer: Frequency Enhanced Decomposed Transformer for Long-term
Series Forecasting [23.199388386249215]
We propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series.
We exploit the fact that most time series tend to have a sparse representation in well-known basis such as Fourier transform.
Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer (bf FEDformer), is more efficient than standard Transformer.
arXiv Detail & Related papers (2022-01-30T06:24:25Z) - TCCT: Tightly-Coupled Convolutional Transformer on Time Series
Forecasting [6.393659160890665]
We propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures.
Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models.
arXiv Detail & Related papers (2021-08-29T08:49:31Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.