Related papers: Simplifying Graph Transformers

Simplifying Graph Transformers

URL: http://arxiv.org/abs/2504.12588v1
Date: Thu, 17 Apr 2025 02:06:50 GMT
Title: Simplifying Graph Transformers
Authors: Liheng Ma, Soumyasundar Pal, Yingxue Zhang, Philip H. S. Torr, Mark Coates,
Abstract summary: We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions.<n>Specifically, we advocate for the use of (1) simplified $L$ attention to measure the magnitude of closeness tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder.
Score: 64.50059165186701
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have attained outstanding performance across various modalities, employing scaled-dot-product (SDP) attention mechanisms. Researchers have attempted to migrate Transformers to graph learning, but most advanced Graph Transformers are designed with major architectural differences, either integrating message-passing or incorporating sophisticated attention mechanisms. These complexities prevent the easy adoption of Transformer training advances. We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions. Specifically, we advocate for the use of (1) simplified $L_2$ attention to measure the magnitude closeness of tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder. Significant performance gains across a variety of graph datasets justify the effectiveness of our proposed modifications. Furthermore, empirical evaluation on the expressiveness benchmark reveals noteworthy realized expressiveness in the graph isomorphism.

Related papers

Plain Transformers are Surprisingly Powerful Link Predictors [57.01966734467712]
Link prediction is a core challenge in graph machine learning, demanding models that capture rich and complex topological dependencies.<n>While Graph Neural Networks (GNNs) are the standard solution, state-of-the-art pipelines often rely on explicit structurals or memory-intensive node embeddings.<n>We present PENCIL, an encoder-only plain Transformer that replaces hand-crafted priors with attention over sampled local subgraphs.
arXiv Detail & Related papers (2026-02-02T02:45:52Z)
SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs. We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning. It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z)
Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors [16.04850782310842]
We build interpretable and lightweight transformer-like neural networks by unrolling iterative optimization algorithms. A normalized signal-dependent graph learning module amounts to a variant of the basic self-attention mechanism in conventional transformers.
arXiv Detail & Related papers (2024-06-06T14:01:28Z)
Automatic Graph Topology-Aware Transformer [50.2807041149784]
We build a comprehensive graph Transformer search space with the micro-level and macro-level designs. EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level. We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks.
arXiv Detail & Related papers (2024-05-30T07:44:31Z)
Graph Convolutions Enrich the Self-Attention in Transformers! [23.47074245564352]
We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph-level tasks, speech recognition, and code classification.
arXiv Detail & Related papers (2023-12-07T11:40:32Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
Graph Inductive Biases in Transformers without Message Passing [47.238185813842996]
New Graph Inductive bias Transformer (GRIT) incorporates graph inductive biases without using message passing. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets.
arXiv Detail & Related papers (2023-05-27T22:26:27Z)
Transformers over Directed Acyclic Graphs [6.263470141349622]
We study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs. We show that it is effective in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.
arXiv Detail & Related papers (2022-10-24T12:04:52Z)
Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers. Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z)
Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs [3.0603554929274908]
3D-related inductive biases are indispensable to graph neural networks operating on 3D atomistic graphs such as molecules. Inspired by the success of Transformers in various domains, we study how to incorporate these inductive biases into Transformers. We present Equiformer, a graph neural network leveraging the strength of Transformer architectures.
arXiv Detail & Related papers (2022-06-23T21:40:37Z)
Relphormer: Relational Graph Transformer for Knowledge Graph Representations [25.40961076988176]
We propose a new variant of Transformer for knowledge graph representations dubbed Relphormer. We propose a novel structure-enhanced self-attention mechanism to encode the relational information and keep the semantic information within entities and relations. Experimental results on six datasets show that Relphormer can obtain better performance compared with baselines.
arXiv Detail & Related papers (2022-05-22T15:30:18Z)
Transformer for Graphs: An Overview from Architecture Perspective [86.3545861392215]
It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks. We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer. Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.
arXiv Detail & Related papers (2022-02-17T06:02:06Z)
Gophormer: Ego-Graph Transformer for Node Classification [27.491500255498845]
In this paper, we propose a novel Gophormer model which applies transformers on ego-graphs instead of full-graphs. Specifically, Node2Seq module is proposed to sample ego-graphs as the input of transformers, which alleviates the challenge of scalability. In order to handle the uncertainty introduced by the ego-graph sampling, we propose a consistency regularization and a multi-sample inference strategy.
arXiv Detail & Related papers (2021-10-25T16:43:32Z)
Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.