Rethinking Graph Transformers with Spectral Attention
- URL: http://arxiv.org/abs/2106.03893v2
- Date: Wed, 9 Jun 2021 01:24:23 GMT
- Title: Rethinking Graph Transformers with Spectral Attention
- Authors: Devin Kreuzer, Dominique Beaini, William L. Hamilton, Vincent
L\'etourneau and Prudencio Tossou
- Abstract summary: We present the $textitSpectral Attention Network$ (SAN), which uses a learned positional encoding (LPE) to learn the position of each node in a given graph.
By leveraging the full spectrum of the Laplacian, our model is theoretically powerful in distinguishing graphs, and can better detect similar sub-structures from their resonance.
Our model performs on par or better than state-of-the-art GNNs, and outperforms any attention-based model by a wide margin.
- Score: 13.068288784805901
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, the Transformer architecture has proven to be very
successful in sequence processing, but its application to other data
structures, such as graphs, has remained limited due to the difficulty of
properly defining positions. Here, we present the $\textit{Spectral Attention
Network}$ (SAN), which uses a learned positional encoding (LPE) that can take
advantage of the full Laplacian spectrum to learn the position of each node in
a given graph. This LPE is then added to the node features of the graph and
passed to a fully-connected Transformer. By leveraging the full spectrum of the
Laplacian, our model is theoretically powerful in distinguishing graphs, and
can better detect similar sub-structures from their resonance. Further, by
fully connecting the graph, the Transformer does not suffer from
over-squashing, an information bottleneck of most GNNs, and enables better
modeling of physical phenomenons such as heat transfer and electric
interaction. When tested empirically on a set of 4 standard datasets, our model
performs on par or better than state-of-the-art GNNs, and outperforms any
attention-based model by a wide margin, becoming the first fully-connected
architecture to perform well on graph benchmarks.
Related papers
- A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks.
This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z) - Technical Report: The Graph Spectral Token -- Enhancing Graph Transformers with Spectral Information [0.8184895397419141]
Graph Transformers have emerged as a powerful alternative to Message-Passing Graph Neural Networks (MP-GNNs)
We propose the Graph Spectral Token, a novel approach to directly encode graph spectral information.
We benchmark the effectiveness of our approach by enhancing two existing graph transformers, GraphTrans and SubFormer.
arXiv Detail & Related papers (2024-04-08T15:24:20Z) - Graph Transformers without Positional Encodings [0.7252027234425334]
We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph.
We empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks.
arXiv Detail & Related papers (2024-01-31T12:33:31Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Stable and Transferable Hyper-Graph Neural Networks [95.07035704188984]
We introduce an architecture for processing signals supported on hypergraphs via graph neural networks (GNNs)
We provide a framework for bounding the stability and transferability error of GNNs across arbitrary graphs via spectral similarity.
arXiv Detail & Related papers (2022-11-11T23:44:20Z) - Transformers over Directed Acyclic Graphs [6.263470141349622]
We study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs.
We show that it is effective in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.
arXiv Detail & Related papers (2022-10-24T12:04:52Z) - Deformable Graph Transformer [31.254872949603982]
We propose Deformable Graph Transformer (DGT) that performs sparse attention with dynamically sampled key and value pairs.
Experiments demonstrate that our novel graph Transformer consistently outperforms existing Transformer-based models.
arXiv Detail & Related papers (2022-06-29T00:23:25Z) - Graph Neural Networks with Learnable Structural and Positional
Representations [83.24058411666483]
A major issue with arbitrary graphs is the absence of canonical positional information of nodes.
We introduce Positional nodes (PE) of nodes, and inject it into the input layer, like in Transformers.
We observe a performance increase for molecular datasets, from 2.87% up to 64.14% when considering learnable PE for both GNN classes.
arXiv Detail & Related papers (2021-10-15T05:59:15Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z) - A Generalization of Transformer Networks to Graphs [5.736353542430439]
We introduce a graph transformer with four new properties compared to the standard model.
The architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs)
arXiv Detail & Related papers (2020-12-17T16:11:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.