Attending to Graph Transformers
- URL: http://arxiv.org/abs/2302.04181v3
- Date: Thu, 28 Mar 2024 19:35:01 GMT
- Title: Attending to Graph Transformers
- Authors: Luis Müller, Mikhail Galkin, Christopher Morris, Ladislav Rampášek,
- Abstract summary: transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs.
Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field.
We probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing.
- Score: 5.609943831664869
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs, such as (message-passing) graph neural networks. So far, they have shown promising empirical results, e.g., on molecular prediction datasets, often attributed to their ability to circumvent graph neural networks' shortcomings, such as over-smoothing and over-squashing. Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field. We overview their theoretical properties, survey structural and positional encodings, and discuss extensions for important graph classes, e.g., 3D molecular graphs. Empirically, we probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing. Further, we outline open challenges and research direction to stimulate future work. Our code is available at https://github.com/luis-mueller/probing-graph-transformers.
Related papers
- Graph Transformers: A Survey [15.68583521879617]
Graph transformers are a recent advancement in machine learning, offering a new class of neural network models for graph-structured data.
This survey provides an in-depth review of recent progress and challenges in graph transformer research.
arXiv Detail & Related papers (2024-07-13T05:15:24Z) - Transformers as Graph-to-Graph Models [13.630495199720423]
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case.
Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions.
arXiv Detail & Related papers (2023-10-27T07:21:37Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - Exphormer: Sparse Transformers for Graphs [5.055213942955148]
We introduce Exphormer, a framework for building powerful and scalable graph transformers.
We show that Exphormer produces models with competitive empirical results on a wide variety of graph datasets.
arXiv Detail & Related papers (2023-03-10T18:59:57Z) - Spectral Augmentations for Graph Contrastive Learning [50.149996923976836]
Contrastive learning has emerged as a premier method for learning representations with or without supervision.
Recent studies have shown its utility in graph representation learning for pre-training.
We propose a set of well-motivated graph transformation operations to provide a bank of candidates when constructing augmentations for a graph contrastive objective.
arXiv Detail & Related papers (2023-02-06T16:26:29Z) - Learning Graph Structure from Convolutional Mixtures [119.45320143101381]
We propose a graph convolutional relationship between the observed and latent graphs, and formulate the graph learning task as a network inverse (deconvolution) problem.
In lieu of eigendecomposition-based spectral methods, we unroll and truncate proximal gradient iterations to arrive at a parameterized neural network architecture that we call a Graph Deconvolution Network (GDN)
GDNs can learn a distribution of graphs in a supervised fashion, perform link prediction or edge-weight regression tasks by adapting the loss function, and they are inherently inductive.
arXiv Detail & Related papers (2022-05-19T14:08:15Z) - Transformer for Graphs: An Overview from Architecture Perspective [86.3545861392215]
It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks.
We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer.
Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.
arXiv Detail & Related papers (2022-02-17T06:02:06Z) - Graph Self-Attention for learning graph representation with Transformer [13.49645012479288]
We propose a novel Graph Self-Attention module to enable Transformer models to learn graph representation.
We propose context-aware attention which considers the interactions between query, key and graph information.
Our method achieves state-of-the-art performance on multiple benchmarks of graph representation learning.
arXiv Detail & Related papers (2022-01-30T11:10:06Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z) - Dirichlet Graph Variational Autoencoder [65.94744123832338]
We present Dirichlet Graph Variational Autoencoder (DGVAE) with graph cluster memberships as latent factors.
Motivated by the low pass characteristics in balanced graph cut, we propose a new variant of GNN named Heatts to encode the input graph into cluster memberships.
arXiv Detail & Related papers (2020-10-09T07:35:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.