Related papers: GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus

GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus

URL: http://arxiv.org/abs/2510.10631v1
Date: Sun, 12 Oct 2025 14:22:32 GMT
Title: GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus
Authors: Zhaolin Hu, Kun Li, Hehe Fan, Yi Yang,
Abstract summary: We propose a novel framework that enhances both the rank and focus of attention.<n>Specifically, we enhance linear attention by attaching a gated local graph network branch to the value matrix.<n>We also introduce a learnable log-power function into the attention scores to reduce entropy and sharpen focus.
Score: 32.63390871016499
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Linear attention mechanisms have emerged as efficient alternatives to full self-attention in Graph Transformers, offering linear time complexity. However, existing linear attention models often suffer from a significant drop in expressiveness due to low-rank projection structures and overly uniform attention distributions. We theoretically prove that these properties reduce the class separability of node representations, limiting the model's classification ability. To address this, we propose a novel hybrid framework that enhances both the rank and focus of attention. Specifically, we enhance linear attention by attaching a gated local graph network branch to the value matrix, thereby increasing the rank of the resulting attention map. Furthermore, to alleviate the excessive smoothing effect inherent in linear attention, we introduce a learnable log-power function into the attention scores to reduce entropy and sharpen focus. We theoretically show that this function decreases entropy in the attention distribution, enhancing the separability of learned embeddings. Extensive experiments on both homophilic and heterophilic graph benchmarks demonstrate that our method achieves competitive performance while preserving the scalability of linear attention.

Related papers

SLA2: Sparse-Linear Attention with Learnable Routing and QAT [86.22100800353991]
We show that SLA2 can achieve 97% attention sparsity and deliver an 18.6x attention speedup while preserving generation quality.<n>Experiments show that SLA2 can achieve 97% attention sparsity and deliver an 18.6x attention speedup while preserving generation quality.
arXiv Detail & Related papers (2026-02-13T07:16:02Z)
HopFormer: Sparse Graph Transformers with Explicit Receptive Field Control [7.178718630094309]
We introduce HopFormer, a graph Transformer that injects structure exclusively through head-specific n-hop masked sparse attention.<n>We show that our approach achieves competitive or superior performance across diverse graph structures.
arXiv Detail & Related papers (2026-02-02T16:09:58Z)
Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification [81.06278257153835]
We propose a graph rewiring method that balances structural bottleneck reduction and graph property preservation.<n>Our method generates graphs with enhanced connectivity while maintaining sparsity and largely preserving the original graph spectrum.
arXiv Detail & Related papers (2025-06-19T08:01:00Z)
Log-Linear Attention [81.09631871212211]
This paper develops log-linear attention, an attention mechanism that balances linear attention's efficiency and the expressiveness of softmax attention.<n>We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length.<n>Log-linear attention is a general framework and can be applied on top of existing linear attention variants.
arXiv Detail & Related papers (2025-06-05T08:44:51Z)
PolaFormer: Polarity-aware Linear Attention for Vision Transformers [16.35834984488344]
Linear attention has emerged as a promising alternative to softmax-based attention.<n>We propose a polarity-aware linear attention mechanism that explicitly models both same-signed and opposite-signed query-key interactions.<n>For simplicity, and recognizing the distinct contributions of each dimension, we employ a learnable power function for rescaling.
arXiv Detail & Related papers (2025-01-25T03:46:35Z)
An end-to-end attention-based approach for learning on graphs [8.552020965470113]
transformer-based architectures for learning on graphs are motivated by attention as an effective learning mechanism.<n>We propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism.<n>Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks.
arXiv Detail & Related papers (2024-02-16T16:20:11Z)
Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification [67.35058947477631]
We introduce Dual-Prism (DP) augmentation methods, including DP-Noise and DP-Mask, which retain essential graph properties while diversifying augmented graphs.<n> Extensive experiments validate the efficiency of our approach, providing a new and promising direction for graph data augmentation.
arXiv Detail & Related papers (2024-01-18T12:58:53Z)
Linear Log-Normal Attention with Unbiased Concentration [3.034257650900382]
We study the self-attention mechanism by analyzing the distribution of the attention matrix and its concentration ability. We propose instruments to measure these quantities and introduce a novel self-attention mechanism, Linear Log-Normal Attention. Our experimental results on popular natural language benchmarks reveal that our proposed Linear Log-Normal Attention outperforms other linearized attention alternatives.
arXiv Detail & Related papers (2023-11-22T17:30:41Z)
FLatten Transformer: Vision Transformer using Focused Linear Attention [80.61335173752146]
Linear attention offers a much more efficient alternative with its linear complexity. Current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead. We propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness.
arXiv Detail & Related papers (2023-08-01T10:37:12Z)
Causally-guided Regularization of Graph Attention Improves Generalizability [69.09877209676266]
We introduce CAR, a general-purpose regularization framework for graph attention networks. Methodname aligns the attention mechanism with the causal effects of active interventions on graph connectivity. For social media network-sized graphs, a CAR-guided graph rewiring approach could allow us to combine the scalability of graph convolutional methods with the higher performance of graph attention.
arXiv Detail & Related papers (2022-10-20T01:29:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.