Related papers: Enhancing Transformer-based models for Long Sequence Time Series Forecasting via Structured Matrix

Enhancing Transformer-based models for Long Sequence Time Series Forecasting via Structured Matrix

URL: http://arxiv.org/abs/2405.12462v4
Date: Mon, 16 Dec 2024 13:47:34 GMT
Title: Enhancing Transformer-based models for Long Sequence Time Series Forecasting via Structured Matrix
Authors: Zhicheng Zhang, Yong Wang, Shaoqi Tan, Bowei Xia, Yujie Luo,
Abstract summary: Self-attention mechanism as the core component of Transformer-based models exhibits great potential.<n>We propose a novel architectural framework that enhances Transformer-based models through the integration of Surrogate Attention Blocks (SAB) and Surrogate Feed-Forward Neural Network Blocks (SFB)<n>The framework reduces both time and space complexity by the replacement of the self-attention and feed-forward layers with SAB and SFB.
Score: 7.3758245014991255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Transformer-based models for long sequence time series forecasting have demonstrated promising results. The self-attention mechanism as the core component of these Transformer-based models exhibits great potential in capturing various dependencies among data points. Despite these advancements, it has been a subject of concern to improve the efficiency of the self-attention mechanism. Unfortunately, current specific optimization methods are facing the challenges in applicability and scalability for the future design of long sequence time series forecasting models. Hence, in this article, we propose a novel architectural framework that enhances Transformer-based models through the integration of Surrogate Attention Blocks (SAB) and Surrogate Feed-Forward Neural Network Blocks (SFB). The framework reduces both time and space complexity by the replacement of the self-attention and feed-forward layers with SAB and SFB while maintaining their expressive power and architectural advantages. The equivalence of this substitution is fully demonstrated. The extensive experiments on 10 Transformer-based models across five distinct time series tasks demonstrate an average performance improvement of 12.4%, alongside 61.3% reduction in parameter counts.

Related papers

Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations [2.2091590689610823]
We re-purpose the Transformer architecture to model both cross-time and cross-variate dependencies. Our method achieves state-of-the-art performance across 13 real-world datasets, delivering performance improvements up to 20.7% over original models.
arXiv Detail & Related papers (2025-05-01T04:59:05Z)
Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting [10.32586981170693]
Inverted Seasonal-Trend Decomposition Transformer (Ister) We introduce a novel Dot-attention mechanism that improves interpretability, computational efficiency, and predictive accuracy. Ister enables intuitive visualization of component contributions, shedding lights on model's decision process and enhancing transparency in prediction results.
arXiv Detail & Related papers (2024-12-25T06:37:19Z)
LSEAttention is All You Need for Time Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. We introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting.
arXiv Detail & Related papers (2024-10-31T09:09:39Z)
Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting [8.841114905151152]
Local Attention Mechanism (LAM) is an efficient attention mechanism tailored for time series analysis. LAM exploits the continuity properties of time series to reduce the number of attention scores computed. We present an algorithm for implementing LAM in algebra tensor that runs in time and memory O(nlogn)
arXiv Detail & Related papers (2024-10-04T11:32:02Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles. Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query. Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
Are Self-Attentions Effective for Time Series Forecasting? [4.990206466948269]
Time series forecasting is crucial for applications across multiple domains and various scenarios. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches. We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS) Our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
arXiv Detail & Related papers (2024-05-27T06:49:39Z)
UnitNorm: Rethinking Normalization for Transformers in Time Series [9.178527914585446]
Normalization techniques are crucial for enhancing Transformer models' performance and stability in time series analysis tasks. We propose UnitNorm, a novel approach that scales input vectors by their norms and modulates attention patterns. UnitNorm's effectiveness is demonstrated across diverse time series analysis tasks, including forecasting, classification, and anomaly detection.
arXiv Detail & Related papers (2024-05-24T19:58:25Z)
Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z)
Correlated Attention in Transformers for Multivariate Time Series [22.542109523780333]
We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
arXiv Detail & Related papers (2023-11-20T17:35:44Z)
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
ETSformer: Exponential Smoothing Transformers for Time-series Forecasting [35.76867542099019]
We propose ETSFormer, a novel time-series Transformer architecture, which exploits the principle of exponential smoothing in improving Transformers for time-series forecasting. In particular, inspired by the classical exponential smoothing methods in time-series forecasting, we propose the novel exponential smoothing attention (ESA) and frequency attention (FA) to replace the self-attention mechanism in vanilla Transformers, thus improving both accuracy and efficiency.
arXiv Detail & Related papers (2022-02-03T02:50:44Z)
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences [52.6022911513076]
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
arXiv Detail & Related papers (2021-12-10T06:58:05Z)
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting [68.86835407617778]
Autoformer is a novel decomposition architecture with an Auto-Correlation mechanism. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a relative improvement on six benchmarks.
arXiv Detail & Related papers (2021-06-24T13:43:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.