Related papers: Less is More: on the Over-Globalizing Problem in Graph Transformers

Less is More: on the Over-Globalizing Problem in Graph Transformers

URL: http://arxiv.org/abs/2405.01102v2
Date: Fri, 24 May 2024 08:53:13 GMT
Title: Less is More: on the Over-Globalizing Problem in Graph Transformers
Authors: Yujie Xing, Xiao Wang, Yibo Li, Hai Huang, Chuan Shi,
Abstract summary: The global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. We show that the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. We propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer) to prevent the over-globalizing problem.
Score: 34.52455014631614
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property always benefit Graph Transformers? We reveal the over-globalizing problem in Graph Transformer by presenting both empirical evidence and theoretical analysis, i.e., the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while keeping the ability to extract valuable information from distant nodes. Moreover, the collaborative training is proposed to improve the model's generalization ability with a theoretical guarantee. Extensive experiments on various graphs well validate the effectiveness of our proposed CoBFormer.

Related papers

SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs. We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning. It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z)
Graph Triple Attention Network: A Decoupled Perspective [8.958483386270638]
Graph Transformers face two primary challenges: multi-view chaos and local-global chaos. We propose a high-level decoupled perspective of GTs, breaking them down into three components and two interaction levels. We design a decoupled graph triple attention network named DeGTA, which separately computes multi-view attentions and adaptively integrates multi-view local and global information.
arXiv Detail & Related papers (2024-08-14T16:29:07Z)
Learning a Mini-batch Graph Transformer via Two-stage Interaction Augmentation [34.969019293698885]
Mini-batch Graph Transformer (MGT) has demonstrated significant advantages in semi-supervised node prediction tasks. The limited number of nodes in each mini-batch restricts the model's capacity to capture the global characteristic of the graph. We propose LGMformer, a novel MGT model that employs a two-stage augmented interaction strategy.
arXiv Detail & Related papers (2024-07-13T14:42:22Z)
Masked Graph Transformer for Large-Scale Recommendation [56.37903431721977]
We propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. Experimental results show the superior performance of our MGFormer, even with a single attention layer.
arXiv Detail & Related papers (2024-05-07T06:00:47Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
Hybrid Focal and Full-Range Attention Based Graph Transformers [0.0]
We present a purely attention-based architecture, namely Focal and Full-Range Graph Transformer (FFGT) FFGT combines the conventional full-range attention with K-hop focal attention on ego-nets to aggregate both global and local information. Our approach enhances the performance of existing Graph Transformers on various open datasets.
arXiv Detail & Related papers (2023-11-08T12:53:07Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
Graph Representation Learning via Contrasting Cluster Assignments [57.87743170674533]
We propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA. It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning. GRCCA has strong competitiveness in most tasks.
arXiv Detail & Related papers (2021-12-15T07:28:58Z)
Graph Representation Learning via Graphical Mutual Information Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations. We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.