Related papers: Transformers with Sparse Attention for Granger Causality

Transformers with Sparse Attention for Granger Causality

URL: http://arxiv.org/abs/2411.13264v1
Date: Wed, 20 Nov 2024 12:34:06 GMT
Title: Transformers with Sparse Attention for Granger Causality
Authors: Riya Mahesh, Rahul Vashisht, Chandrashekar Lakshminarayanan,
Abstract summary: Deep learning based methods such as transformers are increasingly used to capture temporal dynamics and causal relationships beyond mere correlations. Recent works suggest self-attention weights of transformers as a useful indicator of causal links. We propose a novel modification to the self-attention module to establish causal links between the variables of time-series data with varying lag dependencies.
Score: 0.8249694498830561
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Temporal causal analysis means understanding the underlying causes behind observed variables over time. Deep learning based methods such as transformers are increasingly used to capture temporal dynamics and causal relationships beyond mere correlations. Recent works suggest self-attention weights of transformers as a useful indicator of causal links. We leverage this to propose a novel modification to the self-attention module to establish causal links between the variables of multivariate time-series data with varying lag dependencies. Our Sparse Attention Transformer captures causal relationships using a two-fold approach - performing temporal attention first followed by attention between the variables across the time steps masking them individually to compute Granger Causality indices. The key novelty in our approach is the ability of the model to assert importance and pick the most significant past time instances for its prediction task against manually feeding a fixed time lag value. We demonstrate the effectiveness of our approach via extensive experimentation on several synthetic benchmark datasets. Furthermore, we compare the performance of our model with the traditional Vector Autoregression based Granger Causality method that assumes fixed lag length.

Related papers

Transforming Causality: Transformer-Based Temporal Causal Discovery with Prior Knowledge Integration [8.412444798554143]
We introduce a novel framework for temporal causal discovery and inference.<n>It addresses two key challenges: complex nonlinear dependencies and spurious correlations.<n>Our method significantly outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2025-08-21T19:19:11Z)
Rethinking Remaining Useful Life Prediction with Scarce Time Series Data: Regression under Indirect Supervision [4.335413713700667]
We introduce a unified framework called parameterized static regression, which takes single points as inputs for regression of target values. Our method demonstrates competitive performance in prediction accuracy when dealing with highly scarce time series data.
arXiv Detail & Related papers (2025-04-12T13:14:35Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies. We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis. Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
Sensorformer: Cross-patch attention with global-patch compression is effective for high-dimensional multivariate time series forecasting [12.103678233732584]
We propose a new Transformer, Sensorformer, which first compresses the global patch information and then simultaneously extracts cross-variable and cross-time dependencies from the compressed representations. Sensorformer can effectively capture the correct inter-variable correlations and causal relationships, even in the presence of dynamic causal lags between variables.
arXiv Detail & Related papers (2025-01-06T03:14:47Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture. We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z)
DLFormer: Enhancing Explainability in Multivariate Time Series Forecasting using Distributed Lag Embedding [4.995397953581609]
This study introduces DLFormer, an attention-based architecture integrated with distributed lag embedding. It showcases superior performance improvements compared to existing attention-based high-performance models.
arXiv Detail & Related papers (2024-08-29T20:39:54Z)
VCformer: Variable Correlation Transformer with Inherent Lagged Correlation for Multivariate Time Series Forecasting [1.5165632546654102]
We propose Variable Correlation Transformer (VCformer) to mine the correlations among variables. VCA calculates and integrates the cross-correlation scores corresponding to different lags between queries and keys. Inspired by Koopman dynamics theory, we also develop Koopman Temporal Detector (KTD) to better address the non-stationarity in time series.
arXiv Detail & Related papers (2024-05-19T07:39:22Z)
TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables [75.83318701911274]
TimeXer ingests external information to enhance the forecasting of endogenous variables. TimeXer achieves consistent state-of-the-art performance on twelve real-world forecasting benchmarks.
arXiv Detail & Related papers (2024-02-29T11:54:35Z)
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions. The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z)
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z)
Robust representations of oil wells' intervals via sparse attention mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers) The focus in our experiments is on oil&gas data, namely, well logs. To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z)
A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting [4.666618110838523]
Time series forecasting is widely used in the fields of equipment life cycle forecasting, weather forecasting, traffic flow forecasting, and other fields. Some scholars have tried to apply Transformer to time series forecasting because of its powerful parallel training ability. The existing Transformer methods do not pay enough attention to the small time segments that play a decisive role in prediction.
arXiv Detail & Related papers (2022-02-23T10:33:12Z)
Long-Range Transformers for Dynamic Spatiotemporal Forecasting [16.37467119526305]
Methods based on graph neural networks explicitly model variable relationships. Long-Range Transformers can learn interactions between time, value, and information jointly along this extended sequence.
arXiv Detail & Related papers (2021-09-24T22:11:46Z)
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting [68.86835407617778]
Autoformer is a novel decomposition architecture with an Auto-Correlation mechanism. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a relative improvement on six benchmarks.
arXiv Detail & Related papers (2021-06-24T13:43:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.