Graph Inductive Biases in Transformers without Message Passing
- URL: http://arxiv.org/abs/2305.17589v1
- Date: Sat, 27 May 2023 22:26:27 GMT
- Title: Graph Inductive Biases in Transformers without Message Passing
- Authors: Liheng Ma, Chen Lin, Derek Lim, Adriana Romero-Soriano, Puneet K.
Dokania, Mark Coates, Philip Torr, Ser-Nam Lim
- Abstract summary: New Graph Inductive bias Transformer (GRIT) incorporates graph inductive biases without using message passing.
GRIT achieves state-of-the-art empirical performance across a variety of graph datasets.
- Score: 47.238185813842996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers for graph data are increasingly widely studied and successful in
numerous learning tasks. Graph inductive biases are crucial for Graph
Transformers, and previous works incorporate them using message-passing modules
and/or positional encodings. However, Graph Transformers that use
message-passing inherit known issues of message-passing, and differ
significantly from Transformers used in other domains, thus making transfer of
research advances more difficult. On the other hand, Graph Transformers without
message-passing often perform poorly on smaller datasets, where inductive
biases are more crucial. To bridge this gap, we propose the Graph Inductive
bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph
inductive biases without using message passing. GRIT is based on several
architectural changes that are each theoretically and empirically justified,
including: learned relative positional encodings initialized with random walk
probabilities, a flexible attention mechanism that updates node and node-pair
representations, and injection of degree information in each layer. We prove
that GRIT is expressive -- it can express shortest path distances and various
graph propagation matrices. GRIT achieves state-of-the-art empirical
performance across a variety of graph datasets, thus showing the power that
Graph Transformers without message-passing can deliver.
Related papers
- Graph Transformers Dream of Electric Flow [72.06286909236827]
We show that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems.
We present explicit weight configurations for implementing each such graph algorithm, and we bound the errors of the constructed Transformers by the errors of the underlying algorithms.
arXiv Detail & Related papers (2024-10-22T05:11:45Z) - SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - Graph Transformers: A Survey [15.68583521879617]
Graph transformers are a recent advancement in machine learning, offering a new class of neural network models for graph-structured data.
This survey provides an in-depth review of recent progress and challenges in graph transformer research.
arXiv Detail & Related papers (2024-07-13T05:15:24Z) - Transformers over Directed Acyclic Graphs [6.263470141349622]
We study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs.
We show that it is effective in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.
arXiv Detail & Related papers (2022-10-24T12:04:52Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - Transformer for Graphs: An Overview from Architecture Perspective [86.3545861392215]
It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks.
We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer.
Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.
arXiv Detail & Related papers (2022-02-17T06:02:06Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z) - A Generalization of Transformer Networks to Graphs [5.736353542430439]
We introduce a graph transformer with four new properties compared to the standard model.
The architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs)
arXiv Detail & Related papers (2020-12-17T16:11:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.