Related papers: Towards Principled Graph Transformers

Towards Principled Graph Transformers

URL: http://arxiv.org/abs/2401.10119v4
Date: Fri, 08 Nov 2024 10:06:06 GMT
Title: Towards Principled Graph Transformers
Authors: Luis Müller, Daniel Kusuma, Blai Bonet, Christopher Morris,
Abstract summary: Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. We show that the proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power.
Score: 8.897857788525629
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings. Our code is available at https://github.com/luis-mueller/towards-principled-gts

Related papers

Plain Transformers are Surprisingly Powerful Link Predictors [57.01966734467712]
Link prediction is a core challenge in graph machine learning, demanding models that capture rich and complex topological dependencies.<n>While Graph Neural Networks (GNNs) are the standard solution, state-of-the-art pipelines often rely on explicit structurals or memory-intensive node embeddings.<n>We present PENCIL, an encoder-only plain Transformer that replaces hand-crafted priors with attention over sampled local subgraphs.
arXiv Detail & Related papers (2026-02-02T02:45:52Z)
YuriiFormer: A Suite of Nesterov-Accelerated Transformers [62.40952219538543]
We propose a variational framework that interprets transformer layers as iterations of an optimization algorithm acting on token embeddings.<n>In this view, self-attention implements gradient step of an interaction energy, while layers correspond to gradient updates of a potential energy.<n>Standard GPT-style transformers emerge as vanilla gradient descent on the resulting composite objective, implemented via Lie-Trotter splitting between these two energys.
arXiv Detail & Related papers (2026-01-30T18:06:21Z)
Simplifying Graph Transformers [64.50059165186701]
We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions. Specifically, we advocate for the use of (1) simplified $L$ attention to measure the magnitude of closeness tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder.
arXiv Detail & Related papers (2025-04-17T02:06:50Z)
Homomorphism Counts as Structural Encodings for Graph Learning [7.691872607259055]
Graph Transformers are popular neural networks that extend the well-known Transformer architecture to the graph domain.<n>We propose $textitmotif structural encoding$ (MoSE) as a flexible and powerful structural encoding framework based on counting graph homomorphisms.
arXiv Detail & Related papers (2024-10-24T12:09:01Z)
A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z)
Aligning Transformers with Weisfeiler-Leman [5.0452971570315235]
Graph neural network architectures aligned with the $k$-WL hierarchy offer theoretically well-understood expressive power. We develop a theoretical framework that allows the study of established positional encodings such as Laplacian PEs and SPE. We evaluate our transformers on the large-scale PCQM4Mv2 dataset, showing competitive predictive performance with the state-of-the-art.
arXiv Detail & Related papers (2024-06-05T11:06:33Z)
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks. This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z)
Automatic Graph Topology-Aware Transformer [50.2807041149784]
We build a comprehensive graph Transformer search space with the micro-level and macro-level designs. EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level. We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks.
arXiv Detail & Related papers (2024-05-30T07:44:31Z)
On the Convergence of Encoder-only Shallow Transformers [62.639819460956176]
We build the global convergence theory of encoder-only shallow Transformers under a realistic setting. Our results can pave the way for a better understanding of modern Transformers, particularly on training dynamics.
arXiv Detail & Related papers (2023-11-02T20:03:05Z)
On Structural Expressive Power of Graph Transformers [26.84413369075598]
We introduce textbfSEG-WL test (textbfStructural textbfEncoding enhanced textbfWeisfeiler-textbfLehman test), a generalized graph isomorphism test algorithm. We show how graph Transformers' expressive power is determined by the design of structural encodings.
arXiv Detail & Related papers (2023-05-23T12:12:21Z)
Are More Layers Beneficial to Graph Transformers? [97.05661983225603]
Current graph transformers suffer from the bottleneck of improving performance by increasing depth. Deep graph transformers are limited by the vanishing capacity of global attention. We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
arXiv Detail & Related papers (2023-03-01T15:22:40Z)
Deformable Graph Transformer [31.254872949603982]
We propose Deformable Graph Transformer (DGT) that performs sparse attention with dynamically sampled key and value pairs. Experiments demonstrate that our novel graph Transformer consistently outperforms existing Transformer-based models.
arXiv Detail & Related papers (2022-06-29T00:23:25Z)
Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)
Rethinking Graph Transformers with Spectral Attention [13.068288784805901]
We present the $textitSpectral Attention Network$ (SAN), which uses a learned positional encoding (LPE) to learn the position of each node in a given graph. By leveraging the full spectrum of the Laplacian, our model is theoretically powerful in distinguishing graphs, and can better detect similar sub-structures from their resonance. Our model performs on par or better than state-of-the-art GNNs, and outperforms any attention-based model by a wide margin.
arXiv Detail & Related papers (2021-06-07T18:11:11Z)
A Generalization of Transformer Networks to Graphs [5.736353542430439]
We introduce a graph transformer with four new properties compared to the standard model. The architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs)
arXiv Detail & Related papers (2020-12-17T16:11:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.