MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
- URL: http://arxiv.org/abs/2505.20131v1
- Date: Mon, 26 May 2025 15:29:08 GMT
- Title: MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
- Authors: Yuanxin Zhuang, Dazhong Shen, Ying Sun,
- Abstract summary: MolEditRL is a molecular editing framework that integrates structural constraints with precise property optimization.<n>For comprehensive evaluation, we construct MolEdit-Instruct, the largest and most property-rich molecular editing dataset.
- Score: 4.430115182041077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecular editing aims to modify a given molecule to optimize desired chemical properties while preserving structural similarity. However, current approaches typically rely on string-based or continuous representations, which fail to adequately capture the discrete, graph-structured nature of molecules, resulting in limited structural fidelity and poor controllability. In this paper, we propose MolEditRL, a molecular editing framework that explicitly integrates structural constraints with precise property optimization. Specifically, MolEditRL consists of two stages: (1) a discrete graph diffusion model pretrained to reconstruct target molecules conditioned on source structures and natural language instructions; (2) an editing-aware reinforcement learning fine-tuning stage that further enhances property alignment and structural preservation by explicitly optimizing editing decisions under graph constraints. For comprehensive evaluation, we construct MolEdit-Instruct, the largest and most property-rich molecular editing dataset, comprising 3 million diverse examples spanning single- and multi-property tasks across 10 chemical attributes. Experimental results demonstrate that MolEditRL significantly outperforms state-of-the-art methods in both property optimization accuracy and structural fidelity, achieving a 74\% improvement in editing success rate while using 98\% fewer parameters.
Related papers
- MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization [16.402871404193576]
We introduce MolAct, an agentic reinforcement learning framework for molecular design problems.<n>We instantiate the framework to train two model families: MolEditAgent for molecular editing tasks and MolOptAgent for molecular optimization tasks.<n>Results show that treating molecular design as a multi-step, tool-augmented process is key to reliable and interpretable improvements.
arXiv Detail & Related papers (2025-12-23T07:53:57Z) - MolEdit: Knowledge Editing for Multimodal Molecule Language Models [57.85765246726558]
MolEdit is a framework for molecule-to-caption generation and caption-to-molecule generation.<n>MolEdit combines a Multi-Expert Knowledge Adapter that routes edits to specialized experts for different molecular facets with an Expertise-Aware Editing Switcher.<n>MolEdit delivers up to 18.8% higher Reliability and 12.0% better Locality than baselines while maintaining efficiency.
arXiv Detail & Related papers (2025-11-16T20:48:37Z) - Hierarchical Structure-Property Alignment for Data-Efficient Molecular Generation and Editing [14.308978798996472]
HSPAG is a data-efficient framework featuring hierarchical structure-property alignment.<n>We select representative samples through scaffold clustering and hard samples via an auxiliary variational auto-encoder.<n> Experiments demonstrate that HSPAG captures fine-grained structure-property relationships and supports controllable generation under multiple property constraints.
arXiv Detail & Related papers (2025-11-11T10:31:09Z) - Adaptive Substructure-Aware Expert Model for Molecular Property Prediction [5.087741013479207]
Graph Neural Networks (GNNs) have shown promising results by modeling molecules as molecular graphs.<n>Existing methods often overlook the varying contributions of different substructures to molecular properties.<n>We propose Molecular-Mol, a novel GNN-based framework that leverages a Mixture-of-Experts (MoE) approach for molecular property prediction.
arXiv Detail & Related papers (2025-04-08T09:25:03Z) - Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [20.250683535089617]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)<n>By fusing physically and chemically detailed semantics with specialized molecular representations, TransDLM effectively integrates diverse information sources to guide precise optimization.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - XMOL: Explainable Multi-property Optimization of Molecules [2.320539066224081]
We propose Explainable Multi-property Optimization of Molecules (XMOL) to optimize multiple molecular properties simultaneously.
Our approach builds on state-of-the-art geometric diffusion models, extending them to multi-property optimization.
We integrate interpretive and explainable techniques throughout the optimization process.
arXiv Detail & Related papers (2024-09-12T06:35:04Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization [49.85944390503957]
DecompOpt is a structure-based molecular optimization method based on a controllable and diffusion model.
We show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines.
arXiv Detail & Related papers (2024-03-07T02:53:40Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.