Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model
- URL: http://arxiv.org/abs/2410.13597v1
- Date: Thu, 17 Oct 2024 14:30:27 GMT
- Title: Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model
- Authors: Yida Xiong, Kun Li, Weiwei Liu, Jia Wu, Bo Du, Shirui Pan, Wenbin Hu,
- Abstract summary: We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)
TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions.
Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
- Score: 77.50732023411811
- License:
- Abstract: Molecular optimization (MO) is a crucial stage in drug discovery in which task-oriented generated molecules are optimized to meet practical industrial requirements. Existing mainstream MO approaches primarily utilize external property predictors to guide iterative property optimization. However, learning all molecular samples in the vast chemical space is unrealistic for predictors. As a result, errors and noise are inevitably introduced during property prediction due to the nature of approximation. This leads to discrepancy accumulation, generalization reduction and suboptimal molecular candidates. In this paper, we propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM). TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions, thereby preventing error propagation during diffusion process. Guided by physically and chemically detailed textual descriptions, TransDLM samples and optimizes encoded source molecules, retaining core scaffolds of source molecules and ensuring structural similarities. Moreover, TransDLM enables simultaneous sampling of multiple molecules, making it ideal for scalable, efficient large-scale optimization through distributed computation on web platforms. Furthermore, our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset. The code is available at: https://anonymous.4open.science/r/TransDLM-A901.
Related papers
- Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - XMOL: Explainable Multi-property Optimization of Molecules [2.320539066224081]
We propose Explainable Multi-property Optimization of Molecules (XMOL) to optimize multiple molecular properties simultaneously.
Our approach builds on state-of-the-art geometric diffusion models, extending them to multi-property optimization.
We integrate interpretive and explainable techniques throughout the optimization process.
arXiv Detail & Related papers (2024-09-12T06:35:04Z) - Small Molecule Optimization with Large Language Models [17.874902319523663]
We present two language models fine-tuned on a novel corpus of 110M molecules with computed properties, totaling 40B tokens.
We introduce a novel optimization algorithm that leverages our language models to optimize molecules for arbitrary properties given limited access to a black box oracle.
arXiv Detail & Related papers (2024-07-26T17:51:33Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Molecule Optimization via Fragment-based Generative Models [21.888942129750124]
In drug discovery, molecule optimization is an important step in order to modify drug candidates into better ones in terms of desired drug properties.
We present an innovative in silico approach to computationally optimizing molecules and formulate the problem as to generate optimized molecular graphs.
Our generative models follow the key idea of fragment-based drug design, and optimize molecules by modifying their small fragments.
arXiv Detail & Related papers (2020-12-08T05:52:16Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - Predicting drug properties with parameter-free machine learning:
Pareto-Optimal Embedded Modeling (POEM) [0.13854111346209866]
We describe a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization.
We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.
arXiv Detail & Related papers (2020-02-11T17:20:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.