Comparing Graph Transformers via Positional Encodings
- URL: http://arxiv.org/abs/2402.14202v4
- Date: Thu, 22 Aug 2024 23:22:33 GMT
- Title: Comparing Graph Transformers via Positional Encodings
- Authors: Mitchell Black, Zhengchao Wan, Gal Mishne, Amir Nayyeri, Yusu Wang,
- Abstract summary: The distinguishing power of graph transformers is closely tied to the choice of positional encoding.
There are two primary types of positional encoding: absolute positional encodings (APEs) and relative positional encodings (RPEs)
We show that graph transformers using APEs and RPEs are equivalent in terms of distinguishing power.
- Score: 11.5844121984212
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The distinguishing power of graph transformers is closely tied to the choice of positional encoding: features used to augment the base transformer with information about the graph. There are two primary types of positional encoding: absolute positional encodings (APEs) and relative positional encodings (RPEs). APEs assign features to each node and are given as input to the transformer. RPEs instead assign a feature to each pair of nodes, e.g., graph distance, and are used to augment the attention block. A priori, it is unclear which method is better for maximizing the power of the resulting graph transformer. In this paper, we aim to understand the relationship between these different types of positional encodings. Interestingly, we show that graph transformers using APEs and RPEs are equivalent in terms of distinguishing power. In particular, we demonstrate how to interchange APEs and RPEs while maintaining their distinguishing power in terms of graph transformers. Based on our theoretical results, we provide a study on several APEs and RPEs (including the resistance distance and the recently introduced stable and expressive positional encoding (SPE)) and compare their distinguishing power in terms of transformers. We believe our work will help navigate the huge number of choices of positional encoding and will provide guidance on the future design of positional encodings for graph transformers.
Related papers
- Graph as Point Set [31.448841287258116]
This paper introduces a novel graph-to-set conversion method that transforms interconnected nodes into a set of independent points.
It enables using set encoders to learn from graphs, thereby significantly expanding the design space of Graph Neural Networks.
To demonstrate the effectiveness of our approach, we introduce Point Set Transformer (PST), a transformer architecture that accepts a point set converted from a graph as input.
arXiv Detail & Related papers (2024-05-05T02:29:41Z) - Are More Layers Beneficial to Graph Transformers? [97.05661983225603]
Current graph transformers suffer from the bottleneck of improving performance by increasing depth.
Deep graph transformers are limited by the vanishing capacity of global attention.
We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
arXiv Detail & Related papers (2023-03-01T15:22:40Z) - Transformers Meet Directed Graphs [18.490890946129284]
Transformers for directed graphs are a surprisingly underexplored topic, despite their applicability to ubiquitous domains.
In this work, we propose two direction- and structure-aware positional encodings for directed graphs.
We show that the extra directionality information is useful in various downstream tasks, including correctness testing of sorting networks and source code understanding.
arXiv Detail & Related papers (2023-01-31T19:33:14Z) - A Length-Extrapolatable Transformer [98.54835576985664]
We focus on length extrapolation, i.e., training on short texts while evaluating longer sequences.
We introduce a relative position embedding to explicitly maximize attention resolution.
We evaluate different Transformer variants with language modeling.
arXiv Detail & Related papers (2022-12-20T18:56:20Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - SepTr: Separable Transformer for Audio Spectrogram Processing [74.41172054754928]
We propose a new vision transformer architecture called Separable Transformer (SepTr)
SepTr employs two transformer blocks in a sequential manner, the first attending to tokens within the same frequency bin, and the second attending to tokens within the same time interval.
We conduct experiments on three benchmark data sets, showing that our architecture outperforms conventional vision transformers and other state-of-the-art methods.
arXiv Detail & Related papers (2022-03-17T19:48:43Z) - Learnable Fourier Features for Multi-DimensionalSpatial Positional
Encoding [96.9752763607738]
We propose a novel positional encoding method based on learnable Fourier features.
Our experiments show that our learnable feature representation for multi-dimensional positional encoding outperforms existing methods.
arXiv Detail & Related papers (2021-06-05T04:40:18Z) - Transformer-Based Deep Image Matching for Generalizable Person
Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z) - Demystifying the Better Performance of Position Encoding Variants for
Transformer [12.503079503907989]
We show how to encode position and segment into Transformer models.
The proposed method performs on par with SOTA on GLUE, XTREME and WMT benchmarks while saving costs.
arXiv Detail & Related papers (2021-04-18T03:44:57Z) - Do We Really Need Explicit Position Encodings for Vision Transformers? [29.7662570764424]
We propose a conditional position encoding scheme, which is conditioned on the local neighborhood of the input token.
Our new model with PEG is named Visual Transformer (CPVT) and can naturally process the input sequences of arbitrary length.
We demonstrate that CPVT can result in visually similar attention maps and even better performance than those with predefined positional encodings.
arXiv Detail & Related papers (2021-02-22T10:29:55Z) - A Generalization of Transformer Networks to Graphs [5.736353542430439]
We introduce a graph transformer with four new properties compared to the standard model.
The architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs)
arXiv Detail & Related papers (2020-12-17T16:11:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.