Copy-Augmented Representation for Structure Invariant Template-Free Retrosynthesis
- URL: http://arxiv.org/abs/2510.16588v1
- Date: Sat, 18 Oct 2025 17:25:36 GMT
- Title: Copy-Augmented Representation for Structure Invariant Template-Free Retrosynthesis
- Authors: Jiaxi Zhuang, Yu Zhang, Aimin Zhou, Ying Qian,
- Abstract summary: C-SMILES is a novel representation that decomposes traditional SMILES into element-token pairs with five special tokens.<n>Our approach integrates SMILES alignment guidance to enhance attention consistency and ground-truth atom mapping.<n>This work establishes a new paradigm for structure-aware molecular generation with 99.9% validity in generated molecules.
- Score: 17.5286075847689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrosynthesis prediction is fundamental to drug discovery and chemical synthesis, requiring the identification of reactants that can produce a target molecule. Current template-free methods struggle to capture the structural invariance inherent in chemical reactions, where substantial molecular scaffolds remain unchanged, leading to unnecessarily large search spaces and reduced prediction accuracy. We introduce C-SMILES, a novel molecular representation that decomposes traditional SMILES into element-token pairs with five special tokens, effectively minimizing editing distance between reactants and products. Building upon this representation, we incorporate a copy-augmented mechanism that dynamically determines whether to generate new tokens or preserve unchanged molecular fragments from the product. Our approach integrates SMILES alignment guidance to enhance attention consistency with ground-truth atom mappings, enabling more chemically coherent predictions. Comprehensive evaluation on USPTO-50K and large-scale USPTO-FULL datasets demonstrates significant improvements: 67.2% top-1 accuracy on USPTO-50K and 50.8% on USPTO-FULL, with 99.9% validity in generated molecules. This work establishes a new paradigm for structure-aware molecular generation with direct applications in computational drug discovery.
Related papers
- Template-Free Retrosynthesis with Graph-Prior Augmented Transformers [2.538209532048867]
Retrosynthesis reaction prediction aims to infer plausible reactant molecules for a given product.<n>We present a template-free, Transformer-based framework that removes the need for handcrafted reaction templates or additional chemical rule engines.<n>Our model injects molecular graph information into the attention mechanism to jointly exploit SMILES sequences and structural cues.
arXiv Detail & Related papers (2025-12-11T16:08:32Z) - DeepMech: A Machine Learning Framework for Chemical Reaction Mechanism Prediction [2.15242029196761]
We present DeepMech, an interpretable graph-based deep learning framework to generate chemical reaction mechanisms.<n>DeepMech achieves 98.98 +/-0.12% accuracy in predicting elementary steps and 95.94 +/-0.21% in complete CRM tasks.
arXiv Detail & Related papers (2025-09-19T11:14:46Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention [9.018408514318631]
Traditional methods often miss complex molecular structures, leading to inaccuracies.
We introduce the YZS-Model, a deep learning framework integrating Graph Convolutional Networks (GCN), Transformer architectures, and Long Short-Term Memory (LSTM) networks.
YZS-Model achieved an $R2$ of 0.59 and an RMSE of 0.57, outperforming benchmark models.
arXiv Detail & Related papers (2024-06-27T12:40:29Z) - UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment [51.49238426241974]
This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction.
By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules.
arXiv Detail & Related papers (2024-03-25T03:23:03Z) - Molecule-Edit Templates for Efficient and Accurate Retrosynthesis
Prediction [0.16070833439280313]
We introduce METRO, a machine-learning model that predicts reactions using minimal templates.
We achieve state-of-the-art results on standard benchmarks.
arXiv Detail & Related papers (2023-10-11T09:00:02Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z) - Learning Graph Models for Retrosynthesis Prediction [90.15523831087269]
Retrosynthesis prediction is a fundamental problem in organic synthesis.
This paper introduces a graph-based approach that capitalizes on the idea that the graph topology of precursor molecules is largely unaltered during a chemical reaction.
Our model achieves a top-1 accuracy of $53.7%$, outperforming previous template-free and semi-template-based methods.
arXiv Detail & Related papers (2020-06-12T09:40:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.