Transformers are efficient hierarchical chemical graph learners
- URL: http://arxiv.org/abs/2310.01704v1
- Date: Mon, 2 Oct 2023 23:57:04 GMT
- Title: Transformers are efficient hierarchical chemical graph learners
- Authors: Zihan Pengmei, Zimu Li, Chih-chan Tien, Risi Kondor, Aaron R. Dinner
- Abstract summary: SubFormer is a graph transformer that operates on subgraphs that aggregate information by a message-passing mechanism.
We show that SubFormer exhibits limited over-smoothing and avoids over-squashing, which is prevalent in traditional graph neural networks.
- Score: 7.074125287195362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers, adapted from natural language processing, are emerging as a
leading approach for graph representation learning. Contemporary graph
transformers often treat nodes or edges as separate tokens. This approach leads
to computational challenges for even moderately-sized graphs due to the
quadratic scaling of self-attention complexity with token count. In this paper,
we introduce SubFormer, a graph transformer that operates on subgraphs that
aggregate information by a message-passing mechanism. This approach reduces the
number of tokens and enhances learning long-range interactions. We demonstrate
SubFormer on benchmarks for predicting molecular properties from chemical
structures and show that it is competitive with state-of-the-art graph
transformers at a fraction of the computational cost, with training times on
the order of minutes on a consumer-grade graphics card. We interpret the
attention weights in terms of chemical structures. We show that SubFormer
exhibits limited over-smoothing and avoids over-squashing, which is prevalent
in traditional graph neural networks.
Related papers
- Learning Graph Quantized Tokenizers for Transformers [28.79505338383552]
Graph Transformers (GTs) have emerged as a leading model in deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks.
We introduce GQT (textbfGraph textbfQuantized textbfTokenizer), which decouples tokenizer training from Transformer training by leveraging graph self-supervised learning.
By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 16 out of 18 benchmarks, including large-scale homophilic and heterophilic datasets.
arXiv Detail & Related papers (2024-10-17T17:38:24Z) - SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections [45.27160435758666]
We show that mini-batch training of graph transformers is possible by loading each node's token list in batches.
We further prove this PPR tokenization is viable as a graph convolution network with a fixed filter and jumping knowledge.
arXiv Detail & Related papers (2024-03-24T06:10:56Z) - Transformers as Graph-to-Graph Models [13.630495199720423]
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case.
Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions.
arXiv Detail & Related papers (2023-10-27T07:21:37Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - Are More Layers Beneficial to Graph Transformers? [97.05661983225603]
Current graph transformers suffer from the bottleneck of improving performance by increasing depth.
Deep graph transformers are limited by the vanishing capacity of global attention.
We propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation.
arXiv Detail & Related papers (2023-03-01T15:22:40Z) - Attending to Graph Transformers [5.609943831664869]
transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs.
Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field.
We probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing.
arXiv Detail & Related papers (2023-02-08T16:40:11Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain.
This allows us to define an entirely structural model that does not require computing the embedding of the input graph.
Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.