Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model
- URL: http://arxiv.org/abs/2410.13597v2
- Date: Sat, 24 May 2025 15:57:21 GMT
- Title: Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model
- Authors: Yida Xiong, Kun Li, Jiameng Chen, Hongzhi Zhang, Di Lin, Yan Che, Wenbin Hu,
- Abstract summary: We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)<n>By fusing physically and chemically detailed semantics with specialized molecular representations, TransDLM effectively integrates diverse information sources to guide precise optimization.
- Score: 20.250683535089617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecular optimization (MO) is a crucial stage in drug discovery in which task-oriented generated molecules are optimized to meet practical industrial requirements. Existing mainstream MO approaches primarily utilize external property predictors to guide iterative property optimization. However, learning all molecular samples in the vast chemical space is unrealistic for predictors. As a result, errors and noise are inevitably introduced during property prediction due to the nature of approximation. This leads to discrepancy accumulation, generalization reduction and suboptimal molecular candidates. In this paper, we propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM). TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions, thereby mitigating error propagation during diffusion process. By fusing physically and chemically detailed textual semantics with specialized molecular representations, TransDLM effectively integrates diverse information sources to guide precise optimization, which enhances the model's ability to balance structural retention and property enhancement. Additionally, the success of a case study further demonstrates TransDLM's ability to solve practical problems. Experimentally, our approach surpasses state-of-the-art methods in maintaining molecular structural similarity and enhancing chemical properties on the benchmark dataset. The code is available at: https://github.com/Cello2195/TransDLM.
Related papers
- Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - Chemistry-Inspired Diffusion with Non-Differentiable Guidance [10.573577157257564]
Recent advances in diffusion models have shown remarkable potential in the conditional generation of novel molecules.<n>We propose a novel approach that leverage domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model.<n>Instead of relying on neural networks, the oracle provides accurate guidance in the form of estimated gradients, allowing the diffusion process to sample from a conditional distribution specified by quantum chemistry.
arXiv Detail & Related papers (2024-10-09T03:10:21Z) - XMOL: Explainable Multi-property Optimization of Molecules [2.320539066224081]
We propose Explainable Multi-property Optimization of Molecules (XMOL) to optimize multiple molecular properties simultaneously.
Our approach builds on state-of-the-art geometric diffusion models, extending them to multi-property optimization.
We integrate interpretive and explainable techniques throughout the optimization process.
arXiv Detail & Related papers (2024-09-12T06:35:04Z) - Small Molecule Optimization with Large Language Models [17.874902319523663]
We present two language models fine-tuned on a novel corpus of 110M molecules with computed properties, totaling 40B tokens.
We introduce a novel optimization algorithm that leverages our language models to optimize molecules for arbitrary properties given limited access to a black box oracle.
arXiv Detail & Related papers (2024-07-26T17:51:33Z) - LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.<n> Experiments show that LDMol outperforms the existing autoregressive baselines on the text-to-molecule generation benchmark.<n>We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Molecule Optimization via Fragment-based Generative Models [21.888942129750124]
In drug discovery, molecule optimization is an important step in order to modify drug candidates into better ones in terms of desired drug properties.
We present an innovative in silico approach to computationally optimizing molecules and formulate the problem as to generate optimized molecular graphs.
Our generative models follow the key idea of fragment-based drug design, and optimize molecules by modifying their small fragments.
arXiv Detail & Related papers (2020-12-08T05:52:16Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - Predicting drug properties with parameter-free machine learning:
Pareto-Optimal Embedded Modeling (POEM) [0.13854111346209866]
We describe a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization.
We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.
arXiv Detail & Related papers (2020-02-11T17:20:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.