Boosting X-formers with Structured Matrix for Long Sequence Time Series Forecasting
- URL: http://arxiv.org/abs/2405.12462v2
- Date: Wed, 22 May 2024 12:12:15 GMT
- Title: Boosting X-formers with Structured Matrix for Long Sequence Time Series Forecasting
- Authors: Zhicheng Zhang, Yong Wang, Shaoqi Tan, Bowei Xia, Yujie Luo,
- Abstract summary: We propose a novel architectural design for Transformer-based models in long sequence time series forecasting (LSTF) problems.
The framework aims to boost any well-designed model's efficiency without sacrificing its accuracy.
We observe an average performance improvement of 9.45% while achieving a significant reduction in model size by 46%.
- Score: 7.3758245014991255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based models for long sequence time series forecasting (LSTF) problems have gained significant attention due to their exceptional forecasting precision. As the cornerstone of these models, the self-attention mechanism poses a challenge to efficient training and inference due to its quadratic time complexity. In this article, we propose a novel architectural design for Transformer-based models in LSTF, leveraging a substitution framework that incorporates Surrogate Attention Blocks and Surrogate FFN Blocks. The framework aims to boost any well-designed model's efficiency without sacrificing its accuracy. We further establish the equivalence of the Surrogate Attention Block to the self-attention mechanism in terms of both expressiveness and trainability. Through extensive experiments encompassing nine Transformer-based models across five time series tasks, we observe an average performance improvement of 9.45% while achieving a significant reduction in model size by 46%
Related papers
- Are Self-Attentions Effective for Time Series Forecasting? [4.990206466948269]
Time series forecasting is crucial for applications across multiple domains and various scenarios.
Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches.
We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework.
arXiv Detail & Related papers (2024-05-27T06:49:39Z) - UnitNorm: Rethinking Normalization for Transformers in Time Series [9.178527914585446]
Normalization techniques are crucial for enhancing Transformer models' performance and stability in time series analysis tasks.
We propose UnitNorm, a novel approach that scales input vectors by their norms and modulates attention patterns.
UnitNorm's effectiveness is demonstrated across diverse time series analysis tasks, including forecasting, classification, and anomaly detection.
arXiv Detail & Related papers (2024-05-24T19:58:25Z) - TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS)
We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z) - TransNormerLLM: A Faster and Better Large Language Model with Improved
TransNormer [34.790081960470964]
We present TransNormerLLM, the first linear attention-based Large Language Model (LLM)
We make advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.
We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus.
arXiv Detail & Related papers (2023-07-27T16:45:33Z) - Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models.
Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA.
For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - LSG Attention: Extrapolation of pretrained Transformers to long
sequences [0.0]
We introduce the LSG architecture which relies on Local, Sparse and Global attention.
We show that LSG attention is fast, efficient and competitive in classification and summarization tasks on long documents.
We propose tools to train new models and adapt existing ones based on this mechanism.
arXiv Detail & Related papers (2022-10-13T13:10:41Z) - DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and
Quantization [75.72231742114951]
Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks.
These models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency.
We propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model.
arXiv Detail & Related papers (2022-03-21T18:04:25Z) - Autoformer: Decomposition Transformers with Auto-Correlation for
Long-Term Series Forecasting [68.86835407617778]
Autoformer is a novel decomposition architecture with an Auto-Correlation mechanism.
In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a relative improvement on six benchmarks.
arXiv Detail & Related papers (2021-06-24T13:43:43Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.