SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations
- URL: http://arxiv.org/abs/2306.10759v5
- Date: Fri, 16 Aug 2024 08:24:25 GMT
- Title: SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations
- Authors: Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian, Junchi Yan,
- Abstract summary: We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
- Score: 75.71298846760303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning representations on large-sized graphs is a long-standing challenge due to the inter-dependence nature involved in massive data points. Transformers, as an emerging class of foundation encoders for graph-structured data, have shown promising performance on small graphs due to its global attention capable of capturing all-pair influence beyond neighboring nodes. Even so, existing approaches tend to inherit the spirit of Transformers in language and vision tasks, and embrace complicated models by stacking deep multi-head attentions. In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level. This encourages us to rethink the design philosophy for Transformers on large graphs, where the global attention is a computation overhead hindering the scalability. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model that can efficiently propagate information among arbitrary nodes in one layer. SGFormer requires none of positional encodings, feature/graph pre-processing or augmented loss. Empirically, SGFormer successfully scales to the web-scale graph ogbn-papers100M and yields up to 141x inference acceleration over SOTA Transformers on medium-sized graphs. Beyond current results, we believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
Related papers
- SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - Masked Graph Transformer for Large-Scale Recommendation [56.37903431721977]
We propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity.
Experimental results show the superior performance of our MGFormer, even with a single attention layer.
arXiv Detail & Related papers (2024-05-07T06:00:47Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Hybrid Focal and Full-Range Attention Based Graph Transformers [0.0]
We present a purely attention-based architecture, namely Focal and Full-Range Graph Transformer (FFGT)
FFGT combines the conventional full-range attention with K-hop focal attention on ego-nets to aggregate both global and local information.
Our approach enhances the performance of existing Graph Transformers on various open datasets.
arXiv Detail & Related papers (2023-11-08T12:53:07Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - Are More Layers Beneficial to Graph Transformers? [97.05661983225603]
Current graph transformers suffer from the bottleneck of improving performance by increasing depth.
Deep graph transformers are limited by the vanishing capacity of global attention.
We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
arXiv Detail & Related papers (2023-03-01T15:22:40Z) - Gophormer: Ego-Graph Transformer for Node Classification [27.491500255498845]
In this paper, we propose a novel Gophormer model which applies transformers on ego-graphs instead of full-graphs.
Specifically, Node2Seq module is proposed to sample ego-graphs as the input of transformers, which alleviates the challenge of scalability.
In order to handle the uncertainty introduced by the ego-graph sampling, we propose a consistency regularization and a multi-sample inference strategy.
arXiv Detail & Related papers (2021-10-25T16:43:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.