SGFormer: Simplifying and Empowering Transformers for Large-Graph
Representations
- URL: http://arxiv.org/abs/2306.10759v4
- Date: Thu, 4 Jan 2024 14:19:18 GMT
- Title: SGFormer: Simplifying and Empowering Transformers for Large-Graph
Representations
- Authors: Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian
Jiang, Yatao Bian, Junchi Yan
- Abstract summary: We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
- Score: 78.97396248946174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning representations on large-sized graphs is a long-standing challenge
due to the inter-dependence nature involved in massive data points.
Transformers, as an emerging class of foundation encoders for graph-structured
data, have shown promising performance on small graphs due to its global
attention capable of capturing all-pair influence beyond neighboring nodes.
Even so, existing approaches tend to inherit the spirit of Transformers in
language and vision tasks, and embrace complicated models by stacking deep
multi-head attentions. In this paper, we critically demonstrate that even using
a one-layer attention can bring up surprisingly competitive performance across
node property prediction benchmarks where node numbers range from
thousand-level to billion-level. This encourages us to rethink the design
philosophy for Transformers on large graphs, where the global attention is a
computation overhead hindering the scalability. We frame the proposed scheme as
Simplified Graph Transformers (SGFormer), which is empowered by a simple
attention model that can efficiently propagate information among arbitrary
nodes in one layer. SGFormer requires none of positional encodings,
feature/graph pre-processing or augmented loss. Empirically, SGFormer
successfully scales to the web-scale graph ogbn-papers100M and yields up to
141x inference acceleration over SOTA Transformers on medium-sized graphs.
Beyond current results, we believe the proposed methodology alone enlightens a
new technical path of independent interest for building Transformers on large
graphs.
Related papers
- Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Hybrid Focal and Full-Range Attention Based Graph Transformers [0.0]
We present a purely attention-based architecture, namely Focal and Full-Range Graph Transformer (FFGT)
FFGT combines the conventional full-range attention with K-hop focal attention on ego-nets to aggregate both global and local information.
Our approach enhances the performance of existing Graph Transformers on various open datasets.
arXiv Detail & Related papers (2023-11-08T12:53:07Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - AGFormer: Efficient Graph Representation with Anchor-Graph Transformer [95.1825252182316]
We propose a novel graph Transformer architecture, termed Anchor Graph Transformer (AGFormer)
AGFormer first obtains some representative anchors and then converts node-to-node message passing into anchor-to-anchor and anchor-to-node message passing process.
Extensive experiments on several benchmark datasets demonstrate the effectiveness and benefits of proposed AGFormer.
arXiv Detail & Related papers (2023-05-12T14:35:42Z) - Hierarchical Transformer for Scalable Graph Learning [22.462712609402324]
Graph Transformer has demonstrated state-of-the-art performance on benchmarks for graph representation learning.
The complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs.
We introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges.
HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance.
arXiv Detail & Related papers (2023-05-04T14:23:22Z) - Are More Layers Beneficial to Graph Transformers? [97.05661983225603]
Current graph transformers suffer from the bottleneck of improving performance by increasing depth.
Deep graph transformers are limited by the vanishing capacity of global attention.
We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
arXiv Detail & Related papers (2023-03-01T15:22:40Z) - Hierarchical Graph Transformer with Adaptive Node Sampling [19.45896788055167]
We identify the main deficiencies of current graph transformers.
Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph.
We propose a hierarchical attention scheme with graph coarsening to capture the long-range interactions.
arXiv Detail & Related papers (2022-10-08T05:53:25Z) - Gophormer: Ego-Graph Transformer for Node Classification [27.491500255498845]
In this paper, we propose a novel Gophormer model which applies transformers on ego-graphs instead of full-graphs.
Specifically, Node2Seq module is proposed to sample ego-graphs as the input of transformers, which alleviates the challenge of scalability.
In order to handle the uncertainty introduced by the ego-graph sampling, we propose a consistency regularization and a multi-sample inference strategy.
arXiv Detail & Related papers (2021-10-25T16:43:32Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.