Transformers are Graph Neural Networks
- URL: http://arxiv.org/abs/2506.22084v1
- Date: Fri, 27 Jun 2025 10:15:33 GMT
- Title: Transformers are Graph Neural Networks
- Authors: Chaitanya K. Joshi,
- Abstract summary: We show how Transformers can be viewed as message passing GNNs operating on fully connected graphs of tokens.<n>Transformers are expressive set processing networks that learn relationships among input elements without being constrained by apriori graphs.
- Score: 2.372213454118027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We establish connections between the Transformer architecture, originally introduced for natural language processing, and Graph Neural Networks (GNNs) for representation learning on graphs. We show how Transformers can be viewed as message passing GNNs operating on fully connected graphs of tokens, where the self-attention mechanism capture the relative importance of all tokens w.r.t. each-other, and positional encodings provide hints about sequential ordering or structure. Thus, Transformers are expressive set processing networks that learn relationships among input elements without being constrained by apriori graphs. Despite this mathematical connection to GNNs, Transformers are implemented via dense matrix operations that are significantly more efficient on modern hardware than sparse message passing. This leads to the perspective that Transformers are GNNs currently winning the hardware lottery.
Related papers
- Graph Neural Networks as a Substitute for Transformers in Single-Cell Transcriptomics [36.923118950844966]
Graph Neural Networks (GNNs) and Transformers share significant similarities in their encoding strategies for interacting with features from nodes of interest.<n>In this paper, we first explore the similarities and differences between GNNs and Transformers, specifically in terms of relative positions.<n>We conduct extensive experiments on a large-scale position-agnostic dataset-single-cell transcriptomics-finding that GNNs achieve competitive performance compared to Transformers.
arXiv Detail & Related papers (2025-07-05T18:37:16Z) - Plain Transformers Can be Powerful Graph Learners [64.50059165186701]
Researchers have attempted to migrate Transformers to graph learning, but most advanced Graph Transformers have strayed far from plain Transformers.<n>This work demonstrates that the plain Transformer architecture can be a powerful graph learner.
arXiv Detail & Related papers (2025-04-17T02:06:50Z) - GTC: GNN-Transformer Co-contrastive Learning for Self-supervised Heterogeneous Graph Representation [0.9249657468385781]
This paper proposes a collaborative learning scheme for GNN-Transformer and constructs GTC architecture.
For the Transformer branch, we propose Metapath-aware Hop2Token and CG-Hetphormer, which can cooperate with GNN to attentively encode neighborhood information from different levels.
Experiments on real datasets show that GTC exhibits superior performance compared with state-of-the-art methods.
arXiv Detail & Related papers (2024-03-22T12:22:44Z) - Graph Transformers without Positional Encodings [0.7252027234425334]
We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph.
We empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks.
arXiv Detail & Related papers (2024-01-31T12:33:31Z) - TransGNN: Harnessing the Collaborative Power of Transformers and Graph Neural Networks for Recommender Systems [31.922581157563272]
Graph Neural Networks (GNNs) have emerged as promising solutions for collaborative filtering (CF)
We propose TransGNN, a novel model that integrates Transformer and GNN layers in an alternating fashion to mutually enhance their capabilities.
arXiv Detail & Related papers (2023-08-28T07:03:08Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Are More Layers Beneficial to Graph Transformers? [97.05661983225603]
Current graph transformers suffer from the bottleneck of improving performance by increasing depth.
Deep graph transformers are limited by the vanishing capacity of global attention.
We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
arXiv Detail & Related papers (2023-03-01T15:22:40Z) - Relational Attention: Generalizing Transformers for Graph-Structured
Tasks [0.8702432681310401]
Transformers operate over sets of real-valued vectors representing task-specific entities and their attributes.
But as set processors, transformers are at a disadvantage in reasoning over more general graph-structured data.
We generalize transformer attention to consider and update edge vectors in each transformer layer.
arXiv Detail & Related papers (2022-10-11T00:25:04Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z) - MSG-Transformer: Exchanging Local Spatial Information by Manipulating
Messenger Tokens [129.10351459066501]
We propose a specialized token for each region that serves as a messenger (MSG)
By manipulating these MSG tokens, one can flexibly exchange visual information across regions.
We then integrate the MSG token into a multi-scale architecture named MSG-Transformer.
arXiv Detail & Related papers (2021-05-31T17:16:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.