Are More Layers Beneficial to Graph Transformers?
- URL: http://arxiv.org/abs/2303.00579v1
- Date: Wed, 1 Mar 2023 15:22:40 GMT
- Title: Are More Layers Beneficial to Graph Transformers?
- Authors: Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei
- Abstract summary: Current graph transformers suffer from the bottleneck of improving performance by increasing depth.
Deep graph transformers are limited by the vanishing capacity of global attention.
We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
- Score: 97.05661983225603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite that going deep has proven successful in many neural architectures,
the existing graph transformers are relatively shallow. In this work, we
explore whether more layers are beneficial to graph transformers, and find that
current graph transformers suffer from the bottleneck of improving performance
by increasing depth. Our further analysis reveals the reason is that deep graph
transformers are limited by the vanishing capacity of global attention,
restricting the graph transformer from focusing on the critical substructure
and obtaining expressive features. To this end, we propose a novel graph
transformer model named DeepGraph that explicitly employs substructure tokens
in the encoded representation, and applies local attention on related nodes to
obtain substructure based attention encoding. Our model enhances the ability of
the global attention to focus on substructures and promotes the expressiveness
of the representations, addressing the limitation of self-attention as the
graph transformer deepens. Experiments show that our method unblocks the depth
limitation of graph transformers and results in state-of-the-art performance
across various graph benchmarks with deeper models.
Related papers
- SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - Automatic Graph Topology-Aware Transformer [50.2807041149784]
We build a comprehensive graph Transformer search space with the micro-level and macro-level designs.
EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level.
We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks.
arXiv Detail & Related papers (2024-05-30T07:44:31Z) - Technical Report: The Graph Spectral Token -- Enhancing Graph Transformers with Spectral Information [0.8184895397419141]
Graph Transformers have emerged as a powerful alternative to Message-Passing Graph Neural Networks (MP-GNNs)
We propose the Graph Spectral Token, a novel approach to directly encode graph spectral information.
We benchmark the effectiveness of our approach by enhancing two existing graph transformers, GraphTrans and SubFormer.
arXiv Detail & Related papers (2024-04-08T15:24:20Z) - Graph Transformers without Positional Encodings [0.7252027234425334]
We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph.
We empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks.
arXiv Detail & Related papers (2024-01-31T12:33:31Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Gophormer: Ego-Graph Transformer for Node Classification [27.491500255498845]
In this paper, we propose a novel Gophormer model which applies transformers on ego-graphs instead of full-graphs.
Specifically, Node2Seq module is proposed to sample ego-graphs as the input of transformers, which alleviates the challenge of scalability.
In order to handle the uncertainty introduced by the ego-graph sampling, we propose a consistency regularization and a multi-sample inference strategy.
arXiv Detail & Related papers (2021-10-25T16:43:32Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.