Linear Transformer Topological Masking with Graph Random Features
- URL: http://arxiv.org/abs/2410.03462v2
- Date: Tue, 15 Oct 2024 14:12:38 GMT
- Title: Linear Transformer Topological Masking with Graph Random Features
- Authors: Isaac Reid, Kumar Avinava Dubey, Deepali Jain, Will Whitney, Amr Ahmed, Joshua Ainslie, Alex Bewley, Mithun Jacob, Aranyak Mehta, David Rendleman, Connor Schenck, Richard E. Turner, René Wagner, Adrian Weller, Krzysztof Choromanski,
- Abstract summary: We show how to parameterise topological masks as a learnable function of a weighted adjacency matrix.
Our efficient masking algorithms provide strong performance gains for tasks on image and point cloud data.
- Score: 52.717865653036796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When training transformers on graph-structured data, incorporating information about the underlying topology is crucial for good performance. Topological masking, a type of relative position encoding, achieves this by upweighting or downweighting attention depending on the relationship between the query and keys in a graph. In this paper, we propose to parameterise topological masks as a learnable function of a weighted adjacency matrix -- a novel, flexible approach which incorporates a strong structural inductive bias. By approximating this mask with graph random features (for which we prove the first known concentration bounds), we show how this can be made fully compatible with linear attention, preserving $\mathcal{O}(N)$ time and space complexity with respect to the number of input tokens. The fastest previous alternative was $\mathcal{O}(N \log N)$ and only suitable for specific graphs. Our efficient masking algorithms provide strong performance gains for tasks on image and point cloud data, including with $>30$k nodes.
Related papers
- NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner [28.321233121613112]
Masked graph autoencoders (e.g., GraphMAE) have recently produced promising results.
We present a masked self-supervised learning framework GraphMAE2 with the goal of overcoming this issue.
We show that GraphMAE2 can consistently generate top results on various public datasets.
arXiv Detail & Related papers (2023-04-10T17:25:50Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Neural Topological Ordering for Computation Graphs [23.225391263047364]
We propose an end-to-end machine learning based approach for topological ordering using an encoder-decoder framework.
We show that our model outperforms, or is on-par, with several topological ordering baselines while being significantly faster on synthetic graphs with up to 2k nodes.
arXiv Detail & Related papers (2022-07-13T00:12:02Z) - Soft-mask: Adaptive Substructure Extractions for Graph Neural Networks [40.64326531965043]
A graph neural network should be able to efficiently extract task-relevant structures and be invariant to irrelevant parts.
In this work, we propose to learn graph representations from a sequence of subgraphs of the original graph to better capture task-relevant substructures or hierarchical structures and skip $noisy$ parts.
The soft-mask GNN layer is not limited by the fixed sample or drop ratio, and therefore is more flexible to extract subgraphs with arbitrary sizes.
arXiv Detail & Related papers (2022-06-11T11:04:23Z) - Gransformer: Transformer-based Graph Generation [14.161975556325796]
Gransformer is an algorithm based on Transformer for generating graphs.
We modify the Transformer encoder to exploit the structural information of the given graph.
We also introduce a graph-based familiarity measure between node pairs.
arXiv Detail & Related papers (2022-03-25T14:05:12Z) - Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain.
This allows us to define an entirely structural model that does not require computing the embedding of the input graph.
Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z) - Accurate Learning of Graph Representations with Graph Multiset Pooling [45.72542969364438]
We propose a Graph Multiset Transformer (GMT) that captures the interaction between nodes according to their structural dependencies.
Our experimental results show that GMT significantly outperforms state-of-the-art graph pooling methods on graph classification benchmarks.
arXiv Detail & Related papers (2021-02-23T07:45:58Z) - My Body is a Cage: the Role of Morphology in Graph-Based Incompatible
Control [65.77164390203396]
We present a series of ablations on existing methods that show that morphological information encoded in the graph does not improve their performance.
Motivated by the hypothesis that any benefits GNNs extract from the graph structure are outweighed by difficulties they create for message passing, we also propose Amorpheus.
arXiv Detail & Related papers (2020-10-05T08:37:11Z) - Building powerful and equivariant graph neural networks with structural
message-passing [74.93169425144755]
We propose a powerful and equivariant message-passing framework based on two ideas.
First, we propagate a one-hot encoding of the nodes, in addition to the features, in order to learn a local context matrix around each node.
Second, we propose methods for the parametrization of the message and update functions that ensure permutation equivariance.
arXiv Detail & Related papers (2020-06-26T17:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.