Related papers: AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

URL: http://arxiv.org/abs/2405.03481v1
Date: Mon, 6 May 2024 13:53:09 GMT
Title: AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers
Authors: Wenhao Zhu, Guojie Song, Liang Wang, Shaoguo Liu,
Abstract summary: We propose AnchorGT, a novel attention architecture for Graph Transformers (GTs) with global receptive field and almost linear complexity. Inspired by anchor-based GNNs, we employ structurally important $k$-dominating node set as anchors and design an attention mechanism that focuses on the relationship between individual nodes and anchors. With its intuitive design, AnchorGT can easily replace the attention module in various GT models with different network architectures.
Score: 35.04198789195943
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph Transformers (GTs) have significantly advanced the field of graph representation learning by overcoming the limitations of message-passing graph neural networks (GNNs) and demonstrating promising performance and expressive power. However, the quadratic complexity of self-attention mechanism in GTs has limited their scalability, and previous approaches to address this issue often suffer from expressiveness degradation or lack of versatility. To address this issue, we propose AnchorGT, a novel attention architecture for GTs with global receptive field and almost linear complexity, which serves as a flexible building block to improve the scalability of a wide range of GT models. Inspired by anchor-based GNNs, we employ structurally important $k$-dominating node set as anchors and design an attention mechanism that focuses on the relationship between individual nodes and anchors, while retaining the global receptive field for all nodes. With its intuitive design, AnchorGT can easily replace the attention module in various GT models with different network architectures and structural encodings, resulting in reduced computational overhead without sacrificing performance. In addition, we theoretically prove that AnchorGT attention can be strictly more expressive than Weisfeiler-Lehman test, showing its superiority in representing graph structures. Our experiments on three state-of-the-art GT models demonstrate that their AnchorGT variants can achieve better results while being faster and significantly more memory efficient.

Related papers

GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning [54.60090631330295]
Graph Neural Networks (GNNs) have demonstrated significant success in learning from graph-structured data but often struggle on heterophilous graphs.<n>We propose GLANCE, a novel framework that integrates logic-guided reasoning, dynamic graph refinement, and adaptive clustering to enhance graph representation learning.
arXiv Detail & Related papers (2025-07-24T15:45:26Z)
Unlocking the Potential of Classic GNNs for Graph-level Tasks: Simple Architectures Meet Excellence [7.14327815822376]
We explore the untapped potential of Graph Neural Networks (GNNs) through an enhanced framework, GNN+. We conduct a systematic evaluation of three classic GNNs, namely GCN, GIN, and GatedGCN, enhanced by the GNN+ framework across 14 well-known graph-level datasets. Our results show that, contrary to the prevailing belief, classic GNNs excel in graph-level tasks, securing top three rankings across all datasets and achieving first place in eight, while also demonstrating greater efficiency than GTs.
arXiv Detail & Related papers (2025-02-13T12:24:23Z)
Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification [4.129489934631072]
Graph neural networks excel at graph representation learning but struggle with heterophilous data and long-range dependencies. We propose GNNMoE, a universal model architecture for node classification. We show that GNNMoE performs exceptionally well across various types of graph data, effectively alleviating the over-smoothing issue and global noise.
arXiv Detail & Related papers (2024-12-11T08:35:13Z)
Rethinking Graph Transformer Architecture Design for Node Classification [4.497245600377944]
Graph Transformer (GT) is a special type of Graph Neural Networks (GNNs) that utilize multi-head attention to facilitate high-order message passing. In this work, we conduct observational experiments to explore the adaptability of the GT architecture in node classification tasks. Our proposed GT architecture can effectively adapt to node classification tasks without being affected by global noise and computational efficiency limitations.
arXiv Detail & Related papers (2024-10-15T02:08:16Z)
SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs. We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning. It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z)
A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z)
Efficient Topology-aware Data Augmentation for High-Degree Graph Neural Networks [2.7523980737007414]
We propose TADA, an efficient and effective front-mounted data augmentation framework for graph neural networks (GNNs) on high-degree graphs (HDGs) Under the hood, TADA includes two key modules: (i) feature expansion with structure embeddings, and (ii) topology- and attribute-aware graph sparsification. TADA considerably improves the predictive performance of mainstream GNN models on 8 real homophilic/heterophilic HDGs in terms of node classification.
arXiv Detail & Related papers (2024-06-08T14:14:19Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
HINormer: Representation Learning On Heterogeneous Information Networks with Graph Transformer [29.217820912610602]
Graph Transformers (GTs) have been proposed which work in the paradigm that allows message passing to a larger coverage even across the whole graph. The investigation of GTs on heterogeneous information networks (HINs) is still under-exploited. We propose a novel model named HINormer, which capitalizes on a larger-range aggregation mechanism for node representation learning.
arXiv Detail & Related papers (2023-02-22T12:25:07Z)
Gradient Gating for Deep Multi-Rate Learning on Graphs [62.25886489571097]
We present Gradient Gating (G$2$), a novel framework for improving the performance of Graph Neural Networks (GNNs) Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph.
arXiv Detail & Related papers (2022-10-02T13:19:48Z)
Heterogeneous Graph Transformer [49.675064816860505]
Heterogeneous Graph Transformer (HGT) architecture for modeling Web-scale heterogeneous graphs. To handle dynamic heterogeneous graphs, we introduce the relative temporal encoding technique into HGT. To handle Web-scale graph data, we design the heterogeneous mini-batch graph sampling algorithm---HGSampling---for efficient and scalable training.
arXiv Detail & Related papers (2020-03-03T04:49:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.