Related papers: Do Transformers Really Perform Bad for Graph Representation?

Do Transformers Really Perform Bad for Graph Representation?

URL: http://arxiv.org/abs/2106.05234v1
Date: Wed, 9 Jun 2021 17:18:52 GMT
Title: Do Transformers Really Perform Bad for Graph Representation?
Authors: Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen and Tie-Yan Liu
Abstract summary: We present Graphormer, which is built upon the standard Transformer architecture. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
Score: 62.68420868623308
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excellent results on a broad range of graph representation learning tasks, especially on the recent OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model. To this end, we propose several simple yet effective structural encoding methods to help Graphormer better model graph-structured data. Besides, we mathematically characterize the expressive power of Graphormer and exhibit that with our ways of encoding the structural information of graphs, many popular GNN variants could be covered as the special cases of Graphormer.

Related papers

A Survey of Graph Transformers: Architectures, Theories and Applications [54.561539625830186]
Recent studies have proposed diverse architectures, enhanced explainability, and practical applications for Graph Transformers. We categorize the architecture of Graph Transformers according to their strategies for processing structural information. We provide a summary of the practical applications where Graph Transformers have been utilized, such as molecule, protein, language, vision, traffic, brain and material data.
arXiv Detail & Related papers (2025-02-23T10:55:19Z)
Technical Report: The Graph Spectral Token -- Enhancing Graph Transformers with Spectral Information [0.8184895397419141]
Graph Transformers have emerged as a powerful alternative to Message-Passing Graph Neural Networks (MP-GNNs) We propose the Graph Spectral Token, a novel approach to directly encode graph spectral information. We benchmark the effectiveness of our approach by enhancing two existing graph transformers, GraphTrans and SubFormer.
arXiv Detail & Related papers (2024-04-08T15:24:20Z)
Graph Transformers without Positional Encodings [0.7252027234425334]
We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph. We empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks.
arXiv Detail & Related papers (2024-01-31T12:33:31Z)
Transformers as Graph-to-Graph Models [13.630495199720423]
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions.
arXiv Detail & Related papers (2023-10-27T07:21:37Z)
Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models. We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning. By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
Diffusing Graph Attention [15.013509382069046]
We develop a new model for Graph Transformers that integrates the arbitrary graph structure into the architecture. GD learns to extract structural and positional relationships between distant nodes in the graph, which it then uses to direct the Transformer's attention and node representation. Experiments on eight benchmarks show Graph diffuser to be a highly competitive model, outperforming the state-of-the-art in a diverse set of domains.
arXiv Detail & Related papers (2023-03-01T16:11:05Z)
Are More Layers Beneficial to Graph Transformers? [97.05661983225603]
Current graph transformers suffer from the bottleneck of improving performance by increasing depth. Deep graph transformers are limited by the vanishing capacity of global attention. We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
arXiv Detail & Related papers (2023-03-01T15:22:40Z)
Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers. Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z)
Transformer for Graphs: An Overview from Architecture Perspective [86.3545861392215]
It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks. We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer. Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.
arXiv Detail & Related papers (2022-02-17T06:02:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.