G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar
Tree Transformer
- URL: http://arxiv.org/abs/2305.03153v2
- Date: Mon, 14 Aug 2023 17:38:23 GMT
- Title: G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar
Tree Transformer
- Authors: Kevin Zhang, Vipul Mann, Venkat Venkatasubramanian
- Abstract summary: We propose a chemistry-aware retrosynthesis prediction framework that combines powerful data-driven models with prior domain knowledge.
The proposed framework, grammar-based molecular attention tree transformer (G-MATT), achieves significant performance improvements compared to baseline retrosynthesis models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Various template-based and template-free approaches have been proposed for
single-step retrosynthesis prediction in recent years. While these approaches
demonstrate strong performance from a data-driven metrics standpoint, many
model architectures do not incorporate underlying chemistry principles. Here,
we propose a novel chemistry-aware retrosynthesis prediction framework that
combines powerful data-driven models with prior domain knowledge. We present a
tree-to-sequence transformer architecture that utilizes hierarchical SMILES
grammar-based trees, incorporating crucial chemistry information that is often
overlooked by SMILES text-based representations, such as local structures and
functional groups. The proposed framework, grammar-based molecular attention
tree transformer (G-MATT), achieves significant performance improvements
compared to baseline retrosynthesis models. G-MATT achieves a promising top-1
accuracy of 51% (top-10 accuracy of 79.1%), invalid rate of 1.5%, and bioactive
similarity rate of 74.8% on the USPTO- 50K dataset. Additional analyses of
G-MATT attention maps demonstrate the ability to retain chemistry knowledge
without relying on excessively complex model architectures.
Related papers
- BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment [51.49238426241974]
This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction.
By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules.
arXiv Detail & Related papers (2024-03-25T03:23:03Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - Retrosynthesis Prediction with Local Template Retrieval [112.23386062396622]
Retrosynthesis, which predicts the reactants of a given target molecule, is an essential task for drug discovery.
In this work, we introduce RetroKNN, a local reaction template retrieval method.
We conduct comprehensive experiments on two widely used benchmarks, the USPTO-50K and USPTO-MIT.
arXiv Detail & Related papers (2023-06-07T03:38:03Z) - G2GT: Retrosynthesis Prediction with Graph to Graph Attention Neural
Network and Self-Training [0.0]
Retrosynthesis prediction is one of the fundamental challenges in organic chemistry and related fields.
We propose a new graph-to-graph transformation model, G2GT, in which the graph encoder and graph decoder are built upon the standard transformer structure.
We show that self-training, a powerful data augmentation method, can significantly improve the model's performance.
arXiv Detail & Related papers (2022-04-19T01:55:52Z) - Permutation invariant graph-to-sequence model for template-free
retrosynthesis and reaction prediction [2.5655440962401617]
We describe a novel Graph2SMILES model that combines the power of Transformer models for text generation with the permutation invariance of molecular graph encoders.
As an end-to-end architecture, Graph2SMILES can be used as a drop-in replacement for the Transformer in any task involving molecule(s)-to-molecule(s) transformations.
arXiv Detail & Related papers (2021-10-19T01:23:15Z) - Machine learning with persistent homology and chemical word embeddings
improves prediction accuracy and interpretability in metal-organic frameworks [0.07874708385247352]
We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material's structure and chemistry.
It automatically encapsulates geometric and chemical information directly from the material system.
Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from the commonly-used, manually-curated features.
arXiv Detail & Related papers (2020-10-01T16:31:46Z) - Energy-based View of Retrosynthesis [70.66156081030766]
We propose a framework that unifies sequence- and graph-based methods as energy-based models.
We present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction.
This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
arXiv Detail & Related papers (2020-07-14T18:51:06Z) - State-of-the-Art Augmented NLP Transformer models for direct and
single-step retrosynthesis [0.0]
We investigated the effect of different training scenarios on predicting retrosynthesis of chemical compounds.
Data augmentation, which is a powerful method used in image processing, eliminated the effect of data memorization by neural networks.
arXiv Detail & Related papers (2020-03-05T18:11:11Z) - Retrosynthesis Prediction with Conditional Graph Logic Network [118.70437805407728]
Computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities.
We propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks.
arXiv Detail & Related papers (2020-01-06T05:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.