Sensorformer: Cross-patch attention with global-patch compression is effective for high-dimensional multivariate time series forecasting
- URL: http://arxiv.org/abs/2501.03284v1
- Date: Mon, 06 Jan 2025 03:14:47 GMT
- Title: Sensorformer: Cross-patch attention with global-patch compression is effective for high-dimensional multivariate time series forecasting
- Authors: Liyang Qin, Xiaoli Wang, Chunhua Yang, Huaiwen Zou, Haochuan Zhang,
- Abstract summary: We propose a new Transformer, Sensorformer, which first compresses the global patch information and then simultaneously extracts cross-variable and cross-time dependencies from the compressed representations.
Sensorformer can effectively capture the correct inter-variable correlations and causal relationships, even in the presence of dynamic causal lags between variables.
- Score: 12.103678233732584
- License:
- Abstract: Among the existing Transformer-based multivariate time series forecasting methods, iTransformer, which treats each variable sequence as a token and only explicitly extracts cross-variable dependencies, and PatchTST, which adopts a channel-independent strategy and only explicitly extracts cross-time dependencies, both significantly outperform most Channel-Dependent Transformer that simultaneously extract cross-time and cross-variable dependencies. This indicates that existing Transformer-based multivariate time series forecasting methods still struggle to effectively fuse these two types of information. We attribute this issue to the dynamic time lags in the causal relationships between different variables. Therefore, we propose a new multivariate time series forecasting Transformer, Sensorformer, which first compresses the global patch information and then simultaneously extracts cross-variable and cross-time dependencies from the compressed representations. Sensorformer can effectively capture the correct inter-variable correlations and causal relationships, even in the presence of dynamic causal lags between variables, while also reducing the computational complexity of pure cross-patch self-attention from $O(D^2 \cdot Patch\_num^2 \cdot d\_model)$ to $O(D^2 \cdot Patch\_num \cdot d\_model)$. Extensive comparative and ablation experiments on 9 mainstream real-world multivariate time series forecasting datasets demonstrate the superiority of Sensorformer. The implementation of Sensorformer, following the style of the Time-series-library and scripts for reproducing the main results, is publicly available at https://github.com/BigYellowTiger/Sensorformer
Related papers
- Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - Rough Transformers: Lightweight and Continuous Time Series Modelling through Signature Patching [46.58170057001437]
We introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences.
We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts.
arXiv Detail & Related papers (2024-05-31T14:00:44Z) - Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers [55.475142494272724]
Time series prediction is crucial for understanding and forecasting complex dynamics in various domains.
We introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions.
The model consistently delivers state-of-the-art performance across various real-world datasets.
arXiv Detail & Related papers (2024-05-22T16:41:21Z) - VCformer: Variable Correlation Transformer with Inherent Lagged Correlation for Multivariate Time Series Forecasting [1.5165632546654102]
We propose Variable Correlation Transformer (VCformer) to mine the correlations among variables.
VCA calculates and integrates the cross-correlation scores corresponding to different lags between queries and keys.
Inspired by Koopman dynamics theory, we also develop Koopman Temporal Detector (KTD) to better address the non-stationarity in time series.
arXiv Detail & Related papers (2024-05-19T07:39:22Z) - EdgeConvFormer: Dynamic Graph CNN and Transformer based Anomaly
Detection in Multivariate Time Series [7.514010315664322]
We propose a novel anomaly detection method, named EdgeConvFormer, which integrates stacked Time2vec embedding, dynamic graph CNN, and Transformer to extract global and local spatial-time information.
Experiments demonstrate that EdgeConvFormer can learn the spatial-temporal modeling from multivariate time series data and achieve better anomaly detection performance than the state-of-the-art approaches on many real-world datasets of different scales.
arXiv Detail & Related papers (2023-12-04T08:38:54Z) - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions.
The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - A Time Series is Worth 64 Words: Long-term Forecasting with Transformers [4.635547236305835]
We propose an efficient design of Transformer-based models for time series forecasting and self-supervised representation learning.
It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer.
PatchTST can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models.
arXiv Detail & Related papers (2022-11-27T05:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.