Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs
- URL: http://arxiv.org/abs/2502.12352v1
- Date: Mon, 17 Feb 2025 22:35:16 GMT
- Title: Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs
- Authors: Batu El, Deepro Choudhury, Pietro LiĆ², Chaitanya K. Joshi,
- Abstract summary: We introduce Attention Graphs, a new tool for mechanistic interpretability of Graph Neural Networks (GNNs) and Graph Transformers.
Attention Graphs aggregate attention matrices across Transformer layers and heads to describe how information flows among input nodes.
- Score: 16.249474010042736
- License:
- Abstract: We introduce Attention Graphs, a new tool for mechanistic interpretability of Graph Neural Networks (GNNs) and Graph Transformers based on the mathematical equivalence between message passing in GNNs and the self-attention mechanism in Transformers. Attention Graphs aggregate attention matrices across Transformer layers and heads to describe how information flows among input nodes. Through experiments on homophilous and heterophilous node classification tasks, we analyze Attention Graphs from a network science perspective and find that: (1) When Graph Transformers are allowed to learn the optimal graph structure using all-to-all attention among input nodes, the Attention Graphs learned by the model do not tend to correlate with the input/original graph structure; and (2) For heterophilous graphs, different Graph Transformer variants can achieve similar performance while utilising distinct information flow patterns. Open source code: https://github.com/batu-el/understanding-inductive-biases-of-gnns
Related papers
- Graph Transformers: A Survey [15.68583521879617]
Graph transformers are a recent advancement in machine learning, offering a new class of neural network models for graph-structured data.
This survey provides an in-depth review of recent progress and challenges in graph transformer research.
arXiv Detail & Related papers (2024-07-13T05:15:24Z) - Topology-Informed Graph Transformer [7.857955053895979]
'Topology-Informed Graph Transformer (TIGT)' is a novel transformer enhancing both discriminative power in detecting graph isomorphisms and the overall performance of Graph Transformers.
TIGT consists of four components: A topological positional embedding layer using non-isomorphic universal covers based on cyclic subgraphs of graphs to ensure unique graph representation.
TIGT outperforms previous Graph Transformers in classifying synthetic dataset aimed at distinguishing isomorphism classes of graphs.
arXiv Detail & Related papers (2024-02-03T03:17:44Z) - Gramformer: Learning Crowd Counting via Graph-Modulated Transformer [68.26599222077466]
Gramformer is a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively.
A feature-based encoding is proposed to discover the centrality positions or importance of nodes.
Experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method.
arXiv Detail & Related papers (2024-01-08T13:01:54Z) - Transformers as Graph-to-Graph Models [13.630495199720423]
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case.
Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions.
arXiv Detail & Related papers (2023-10-27T07:21:37Z) - Diffusing Graph Attention [15.013509382069046]
We develop a new model for Graph Transformers that integrates the arbitrary graph structure into the architecture.
GD learns to extract structural and positional relationships between distant nodes in the graph, which it then uses to direct the Transformer's attention and node representation.
Experiments on eight benchmarks show Graph diffuser to be a highly competitive model, outperforming the state-of-the-art in a diverse set of domains.
arXiv Detail & Related papers (2023-03-01T16:11:05Z) - Attending to Graph Transformers [5.609943831664869]
transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs.
Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field.
We probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing.
arXiv Detail & Related papers (2023-02-08T16:40:11Z) - Spectral Augmentations for Graph Contrastive Learning [50.149996923976836]
Contrastive learning has emerged as a premier method for learning representations with or without supervision.
Recent studies have shown its utility in graph representation learning for pre-training.
We propose a set of well-motivated graph transformation operations to provide a bank of candidates when constructing augmentations for a graph contrastive objective.
arXiv Detail & Related papers (2023-02-06T16:26:29Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - Spectral Graph Convolutional Networks With Lifting-based Adaptive Graph
Wavelets [81.63035727821145]
Spectral graph convolutional networks (SGCNs) have been attracting increasing attention in graph representation learning.
We propose a novel class of spectral graph convolutional networks that implement graph convolutions with adaptive graph wavelets.
arXiv Detail & Related papers (2021-08-03T17:57:53Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z) - Dirichlet Graph Variational Autoencoder [65.94744123832338]
We present Dirichlet Graph Variational Autoencoder (DGVAE) with graph cluster memberships as latent factors.
Motivated by the low pass characteristics in balanced graph cut, we propose a new variant of GNN named Heatts to encode the input graph into cluster memberships.
arXiv Detail & Related papers (2020-10-09T07:35:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.