Less is More: on the Over-Globalizing Problem in Graph Transformers
- URL: http://arxiv.org/abs/2405.01102v2
- Date: Fri, 24 May 2024 08:53:13 GMT
- Title: Less is More: on the Over-Globalizing Problem in Graph Transformers
- Authors: Yujie Xing, Xiao Wang, Yibo Li, Hai Huang, Chuan Shi,
- Abstract summary: The global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes.
We show that the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened.
We propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer) to prevent the over-globalizing problem.
- Score: 34.52455014631614
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property always benefit Graph Transformers? We reveal the over-globalizing problem in Graph Transformer by presenting both empirical evidence and theoretical analysis, i.e., the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while keeping the ability to extract valuable information from distant nodes. Moreover, the collaborative training is proposed to improve the model's generalization ability with a theoretical guarantee. Extensive experiments on various graphs well validate the effectiveness of our proposed CoBFormer.
Related papers
- Learning a Mini-batch Graph Transformer via Two-stage Interaction Augmentation [34.969019293698885]
Mini-batch Graph Transformer (MGT) has demonstrated significant advantages in semi-supervised node prediction tasks.
The limited number of nodes in each mini-batch restricts the model's capacity to capture the global characteristic of the graph.
We propose LGMformer, a novel MGT model that employs a two-stage augmented interaction strategy.
arXiv Detail & Related papers (2024-07-13T14:42:22Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Hybrid Focal and Full-Range Attention Based Graph Transformers [0.0]
We present a purely attention-based architecture, namely Focal and Full-Range Graph Transformer (FFGT)
FFGT combines the conventional full-range attention with K-hop focal attention on ego-nets to aggregate both global and local information.
Our approach enhances the performance of existing Graph Transformers on various open datasets.
arXiv Detail & Related papers (2023-11-08T12:53:07Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph
Representations [78.97396248946174]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Learning Strong Graph Neural Networks with Weak Information [64.64996100343602]
We develop a principled approach to the problem of graph learning with weak information (GLWI)
We propose D$2$PT, a dual-channel GNN framework that performs long-range information propagation on the input graph with incomplete structure, but also on a global graph that encodes global semantic similarities.
arXiv Detail & Related papers (2023-05-29T04:51:09Z) - Graph Transformer GANs for Graph-Constrained House Generation [223.739067413952]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The GTGAN learns effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.
arXiv Detail & Related papers (2023-03-14T20:35:45Z) - Augmenting Knowledge Transfer across Graphs [16.50013525404218]
We present TRANSNET, a generic learning framework for augmenting knowledge transfer across graphs.
In particular, we introduce a novel notion named trinity signal that can naturally formulate various graph signals at different granularity.
We show that TRANSNET outperforms all existing approaches on seven benchmark datasets by a significant margin.
arXiv Detail & Related papers (2022-12-09T08:46:02Z) - Graph Representation Learning via Contrasting Cluster Assignments [57.87743170674533]
We propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA.
It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning.
GRCCA has strong competitiveness in most tasks.
arXiv Detail & Related papers (2021-12-15T07:28:58Z) - Edge-augmented Graph Transformers: Global Self-attention is Enough for
Graphs [24.796242917673755]
We propose a simple yet powerful extension to the transformer - residual edge channels.
The resultant framework, which we call Edge-augmented Graph Transformer (EGT), can directly accept, process and output structural information as well as node information.
Our framework, which relies on global node feature aggregation, achieves better performance compared to Graph Convolutional Networks (GCN)
arXiv Detail & Related papers (2021-08-07T02:18:11Z) - Graph Representation Learning via Graphical Mutual Information
Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations.
We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.