Relational Attention: Generalizing Transformers for Graph-Structured
Tasks
- URL: http://arxiv.org/abs/2210.05062v1
- Date: Tue, 11 Oct 2022 00:25:04 GMT
- Title: Relational Attention: Generalizing Transformers for Graph-Structured
Tasks
- Authors: Cameron Diao and Ricky Loynd
- Abstract summary: Transformers operate over sets of real-valued vectors representing task-specific entities and their attributes.
But as set processors, transformers are at a disadvantage in reasoning over more general graph-structured data.
We generalize transformer attention to consider and update edge vectors in each transformer layer.
- Score: 0.8702432681310401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers flexibly operate over sets of real-valued vectors representing
task-specific entities and their attributes, where each vector might encode one
word-piece token and its position in a sequence, or some piece of information
that carries no position at all. But as set processors, transformers are at a
disadvantage in reasoning over more general graph-structured data where nodes
represent entities and edges represent relations between entities. To address
this shortcoming, we generalize transformer attention to consider and update
edge vectors in each transformer layer. We evaluate this relational transformer
on a diverse array of graph-structured tasks, including the large and
challenging CLRS Algorithmic Reasoning Benchmark. There, it dramatically
outperforms state-of-the-art graph neural networks expressly designed to reason
over graph-structured data. Our analysis demonstrates that these gains are
attributable to relational attention's inherent ability to leverage the greater
expressivity of graphs over sets.
Related papers
- Simplifying Graph Transformers [64.50059165186701]
We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions.
Specifically, we advocate for the use of (1) simplified $L$ attention to measure the magnitude of closeness tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder.
arXiv Detail & Related papers (2025-04-17T02:06:50Z) - SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Graph Transformers without Positional Encodings [0.7252027234425334]
We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph.
We empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks.
arXiv Detail & Related papers (2024-01-31T12:33:31Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Are More Layers Beneficial to Graph Transformers? [97.05661983225603]
Current graph transformers suffer from the bottleneck of improving performance by increasing depth.
Deep graph transformers are limited by the vanishing capacity of global attention.
We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
arXiv Detail & Related papers (2023-03-01T15:22:40Z) - Transformers over Directed Acyclic Graphs [6.263470141349622]
We study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs.
We show that it is effective in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.
arXiv Detail & Related papers (2022-10-24T12:04:52Z) - Deformable Graph Transformer [31.254872949603982]
We propose Deformable Graph Transformer (DGT) that performs sparse attention with dynamically sampled key and value pairs.
Experiments demonstrate that our novel graph Transformer consistently outperforms existing Transformer-based models.
arXiv Detail & Related papers (2022-06-29T00:23:25Z) - Transformer for Graphs: An Overview from Architecture Perspective [86.3545861392215]
It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks.
We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer.
Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.
arXiv Detail & Related papers (2022-02-17T06:02:06Z) - Edge-augmented Graph Transformers: Global Self-attention is Enough for
Graphs [24.796242917673755]
We propose a simple yet powerful extension to the transformer - residual edge channels.
The resultant framework, which we call Edge-augmented Graph Transformer (EGT), can directly accept, process and output structural information as well as node information.
Our framework, which relies on global node feature aggregation, achieves better performance compared to Graph Convolutional Networks (GCN)
arXiv Detail & Related papers (2021-08-07T02:18:11Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z) - A Generalization of Transformer Networks to Graphs [5.736353542430439]
We introduce a graph transformer with four new properties compared to the standard model.
The architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs)
arXiv Detail & Related papers (2020-12-17T16:11:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.