Plain Transformers Can be Powerful Graph Learners
- URL: http://arxiv.org/abs/2504.12588v2
- Date: Tue, 20 May 2025 20:36:09 GMT
- Title: Plain Transformers Can be Powerful Graph Learners
- Authors: Liheng Ma, Soumyasundar Pal, Yingxue Zhang, Philip H. S. Torr, Mark Coates,
- Abstract summary: Researchers have attempted to migrate Transformers to graph learning, but most advanced Graph Transformers have strayed far from plain Transformers.<n>This work demonstrates that the plain Transformer architecture can be a powerful graph learner.
- Score: 64.50059165186701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have attained outstanding performance across various modalities, owing to their simple but powerful scaled-dot-product (SDP) attention mechanisms. Researchers have attempted to migrate Transformers to graph learning, but most advanced Graph Transformers (GTs) have strayed far from plain Transformers, exhibiting major architectural differences either by integrating message-passing or incorporating sophisticated attention mechanisms. These divergences hinder the easy adoption of training advances for Transformers developed in other domains. Contrary to previous GTs, this work demonstrates that the plain Transformer architecture can be a powerful graph learner. To achieve this, we propose to incorporate three simple, minimal, and easy-to-implement modifications to the plain Transformer architecture to construct our Powerful Plain Graph Transformers (PPGT): (1) simplified $L_2$ attention for measuring the magnitude closeness among tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a simple MLP-based stem for graph positional encoding. Consistent with its theoretical expressivity, PPGT demonstrates noteworthy realized expressivity on the empirical graph expressivity benchmark, comparing favorably to more complicated competitors such as subgraph GNNs and higher-order GNNs. Its outstanding empirical performance across various graph datasets also justifies the practical effectiveness of PPGT.
Related papers
- SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors [16.04850782310842]
We build interpretable and lightweight transformer-like neural networks by unrolling iterative optimization algorithms.
A normalized signal-dependent graph learning module amounts to a variant of the basic self-attention mechanism in conventional transformers.
arXiv Detail & Related papers (2024-06-06T14:01:28Z) - Automatic Graph Topology-Aware Transformer [50.2807041149784]
We build a comprehensive graph Transformer search space with the micro-level and macro-level designs.
EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level.
We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks.
arXiv Detail & Related papers (2024-05-30T07:44:31Z) - Graph Convolutions Enrich the Self-Attention in Transformers! [23.47074245564352]
We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity is slightly larger than that of the original self-attention mechanism.
We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph-level tasks, speech recognition, and code classification.
arXiv Detail & Related papers (2023-12-07T11:40:32Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Graph Inductive Biases in Transformers without Message Passing [47.238185813842996]
New Graph Inductive bias Transformer (GRIT) incorporates graph inductive biases without using message passing.
GRIT achieves state-of-the-art empirical performance across a variety of graph datasets.
arXiv Detail & Related papers (2023-05-27T22:26:27Z) - Transformers over Directed Acyclic Graphs [6.263470141349622]
We study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs.
We show that it is effective in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.
arXiv Detail & Related papers (2022-10-24T12:04:52Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic
Graphs [3.0603554929274908]
3D-related inductive biases are indispensable to graph neural networks operating on 3D atomistic graphs such as molecules.
Inspired by the success of Transformers in various domains, we study how to incorporate these inductive biases into Transformers.
We present Equiformer, a graph neural network leveraging the strength of Transformer architectures.
arXiv Detail & Related papers (2022-06-23T21:40:37Z) - Relphormer: Relational Graph Transformer for Knowledge Graph
Representations [25.40961076988176]
We propose a new variant of Transformer for knowledge graph representations dubbed Relphormer.
We propose a novel structure-enhanced self-attention mechanism to encode the relational information and keep the semantic information within entities and relations.
Experimental results on six datasets show that Relphormer can obtain better performance compared with baselines.
arXiv Detail & Related papers (2022-05-22T15:30:18Z) - Transformer for Graphs: An Overview from Architecture Perspective [86.3545861392215]
It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks.
We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer.
Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.
arXiv Detail & Related papers (2022-02-17T06:02:06Z) - Gophormer: Ego-Graph Transformer for Node Classification [27.491500255498845]
In this paper, we propose a novel Gophormer model which applies transformers on ego-graphs instead of full-graphs.
Specifically, Node2Seq module is proposed to sample ego-graphs as the input of transformers, which alleviates the challenge of scalability.
In order to handle the uncertainty introduced by the ego-graph sampling, we propose a consistency regularization and a multi-sample inference strategy.
arXiv Detail & Related papers (2021-10-25T16:43:32Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.