Related papers: G2GT: Retrosynthesis Prediction with Graph to Graph Attention Neural Network and Self-Training

G2GT: Retrosynthesis Prediction with Graph to Graph Attention Neural Network and Self-Training

URL: http://arxiv.org/abs/2204.08608v1
Date: Tue, 19 Apr 2022 01:55:52 GMT
Title: G2GT: Retrosynthesis Prediction with Graph to Graph Attention Neural Network and Self-Training
Authors: Zaiyun Lin (Beijing Stonewise Technology) and Shiqiu Yin (Beijing Stonewise Technology) and Lei Shi (Beijing Stonewise Technology) and Wenbiao Zhou (Beijing Stonewise Technology) and YingSheng Zhang (Beijing Stonewise Technology)
Abstract summary: Retrosynthesis prediction is one of the fundamental challenges in organic chemistry and related fields. We propose a new graph-to-graph transformation model, G2GT, in which the graph encoder and graph decoder are built upon the standard transformer structure. We show that self-training, a powerful data augmentation method, can significantly improve the model's performance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrosynthesis prediction is one of the fundamental challenges in organic chemistry and related fields. The goal is to find reactants molecules that can synthesize product molecules. To solve this task, we propose a new graph-to-graph transformation model, G2GT, in which the graph encoder and graph decoder are built upon the standard transformer structure. We also show that self-training, a powerful data augmentation method that utilizes unlabeled molecule data, can significantly improve the model's performance. Inspired by the reaction type label and ensemble learning, we proposed a novel weak ensemble method to enhance diversity. We combined beam search, nucleus, and top-k sampling methods to further improve inference diversity and proposed a simple ranking algorithm to retrieve the final top-10 results. We achieved new state-of-the-art results on both the USPTO-50K dataset, with top1 accuracy of 54%, and the larger data set USPTO-full, with top1 accuracy of 50%, and competitive top-10 results.

Related papers

JTreeformer: Graph-Transformer via Latent-Diffusion Model for Molecular Generation [17.268526939713105]
This paper focuses on building a graph transformer-based framework for molecular generation, which we call textbfJTreeformer as it transforms graph generation into junction tree generation. It integrates a directed acyclic GCN into a graph-based Transformer to serve as a decoder, which can iteratively synthesize the entire molecule by leveraging information from the partially constructed molecular structure at each step.
arXiv Detail & Related papers (2025-04-29T13:51:07Z)
Heterogeneous Graph Structure Learning through the Lens of Data-generating Processes [11.774563966512709]
Inferring the graph structure from observed data is a key task in graph machine learning. This paper introduces the first approach for heterogeneous graph structure learning (HGSL)
arXiv Detail & Related papers (2025-03-11T16:14:53Z)
GSTAM: Efficient Graph Distillation with Structural Attention-Matching [13.673737442696154]
We introduce Graph Distillation with Structural Attention Matching ( GSTAM), a novel method for condensing graph classification datasets. GSTAM leverages the attention maps of GNNs to distill structural information from the original dataset into synthetic graphs. Comprehensive experiments demonstrate GSTAM's superiority over existing methods, achieving 0.45% to 6.5% better performance in extreme condensation ratios.
arXiv Detail & Related papers (2024-08-29T19:40:04Z)
MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually. We treat all candidate atoms and bonds as nodes and put them in a graph. We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z)
Permutation Equivariant Graph Framelets for Heterophilous Graph Learning [6.679929638714752]
We develop a new way to implement multi-scale extraction via constructing Haar-type graph framelets. We show that our model can achieve the best performance on certain datasets of heterophilous graphs.
arXiv Detail & Related papers (2023-06-07T09:05:56Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z)
Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction [2.5655440962401617]
We describe a novel Graph2SMILES model that combines the power of Transformer models for text generation with the permutation invariance of molecular graph encoders. As an end-to-end architecture, Graph2SMILES can be used as a drop-in replacement for the Transformer in any task involving molecule(s)-to-molecule(s) transformations.
arXiv Detail & Related papers (2021-10-19T01:23:15Z)
Robust Optimization as Data Augmentation for Large-scale Graphs [117.2376815614148]
We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks.
arXiv Detail & Related papers (2020-10-19T21:51:47Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
Uncovering the Folding Landscape of RNA Secondary Structure with Deep Graph Embeddings [71.20283285671461]
We propose a geometric scattering autoencoder (GSAE) network for learning such graph embeddings. Our embedding network first extracts rich graph features using the recently proposed geometric scattering transform. We show that GSAE organizes RNA graphs both by structure and energy, accurately reflecting bistable RNA structures.
arXiv Detail & Related papers (2020-06-12T00:17:59Z)
Graph-Aware Transformer: Is Attention All Graphs Need? [5.240000443825077]
GRaph-Aware Transformer (GRAT) is first Transformer-based model which can encode and decode whole graphs in end-to-end fashion. GRAT has shown very promising results including state-of-the-art performance on 4 regression tasks in QM9 benchmark.
arXiv Detail & Related papers (2020-06-09T12:13:56Z)
Auto-decoding Graphs [91.3755431537592]
The generative model is an auto-decoder that learns to synthesize graphs from latent codes. Graphs are synthesized using self-attention modules that are trained to identify likely connectivity patterns.
arXiv Detail & Related papers (2020-06-04T14:23:01Z)
A Graph to Graphs Framework for Retrosynthesis Prediction [42.99048270311063]
A fundamental problem in computational chemistry is to find a set of reactants to synthesize a target molecule. We propose a novel template-free approach called G2Gs by transforming a target molecular graph into a set of reactant molecular graphs. G2Gs significantly outperforms existing template-free approaches by up to 63% in terms of the top-1 accuracy.
arXiv Detail & Related papers (2020-03-28T06:16:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.