MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization
- URL: http://arxiv.org/abs/2010.02318v4
- Date: Sun, 30 Jun 2024 23:58:07 GMT
- Title: MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization
- Authors: Tianfan Fu, Cao Xiao, Xinhao Li, Lucas M. Glass, Jimeng Sun,
- Abstract summary: generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
- Score: 51.00815310242277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address such challenges, we propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution. MIMOSA first pretrains two property agnostic graph neural networks (GNNs) for molecule topology and substructure-type prediction, where a substructure can be either atom or single ring. For each iteration, MIMOSA uses the GNNs' prediction and employs three basic substructure operations (add, replace, delete) to generate new molecules and associated weights. The weights can encode multiple constraints including similarity and drug property constraints, upon which we select promising molecules for next iteration. MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.6% relative improvement over the best baseline in terms of success rate. The code repository (including readme file, data preprocessing and model construction, evaluation) is available https://github.com/futianfan/MIMOSA.
Related papers
- Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel
Approach to Generating Molecules with Desirable Properties [33.2976176283611]
We present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs.
To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method.
We show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fr'echet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models.
arXiv Detail & Related papers (2023-10-05T11:43:21Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - A Group Symmetric Stochastic Differential Equation Model for Molecule
Multi-modal Pretraining [36.48602272037559]
molecule pretraining has quickly become the go-to schema to boost the performance of AI-based drug discovery.
Here, we propose MoleculeSDE to generate the 3D reflection from 2D topologies, and vice versa, directly in the input space.
By comparing with 17 pretraining baselines, we empirically verify that MoleculeSDE can learn an expressive representation with state-of-the-art performance on 26 out of 32 downstream tasks.
arXiv Detail & Related papers (2023-05-28T15:56:02Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Improving Small Molecule Generation using Mutual Information Machine [0.0]
MolMIM is a probabilistic auto-encoder for small molecule drug discovery.
We demonstrate MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty.
We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization.
arXiv Detail & Related papers (2022-08-18T18:32:48Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.