Retrieval-based Controllable Molecule Generation
- URL: http://arxiv.org/abs/2208.11126v3
- Date: Mon, 24 Apr 2023 17:50:51 GMT
- Title: Retrieval-based Controllable Molecule Generation
- Authors: Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard Baraniuk,
Anima Anandkumar
- Abstract summary: We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
- Score: 63.44583084888342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating new molecules with specified chemical and biological properties
via generative models has emerged as a promising direction for drug discovery.
However, existing methods require extensive training/fine-tuning with a large
dataset, often unavailable in real-world generation tasks. In this work, we
propose a new retrieval-based framework for controllable molecule generation.
We use a small set of exemplar molecules, i.e., those that (partially) satisfy
the design criteria, to steer the pre-trained generative model towards
synthesizing molecules that satisfy the given design criteria. We design a
retrieval mechanism that retrieves and fuses the exemplar molecules with the
input molecule, which is trained by a new self-supervised objective that
predicts the nearest neighbor of the input molecule. We also propose an
iterative refinement process to dynamically update the generated molecules and
retrieval database for better generalization. Our approach is agnostic to the
choice of generative models and requires no task-specific fine-tuning. On
various tasks ranging from simple design criteria to a challenging real-world
scenario for designing lead compounds that bind to the SARS-CoV-2 main
protease, we demonstrate our approach extrapolates well beyond the retrieval
database, and achieves better performance and wider applicability than previous
methods. Code is available at https://github.com/NVlabs/RetMol.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - ControlMol: Adding Substruture Control To Molecule Diffusion Models [2.8372258697984627]
We present ControlMol, which adds sub-structure control to molecule generation with diffusion models.
We apply our method to both 2D and 3D molecule generation tasks.
arXiv Detail & Related papers (2024-04-22T14:36:19Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - CELLS: Cost-Effective Evolution in Latent Space for Goal-Directed
Molecular Generation [23.618366377098614]
We propose a cost-effective evolution strategy in latent space, which optimize the molecular latent representation vectors.
We adopt a pre-trained molecular generative model to map the latent and observation spaces.
We conduct extensive experiments on multiple optimization tasks comparing the proposed framework to several advanced techniques.
arXiv Detail & Related papers (2021-11-30T11:02:18Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - Scaffold-constrained molecular generation [0.0]
We build on the well-known SMILES-based Recurrent Neural Network (RNN) generative model, with a modified sampling procedure to achieve scaffold-constrained generation.
We showcase the method's ability to perform scaffold-constrained generation on various tasks.
arXiv Detail & Related papers (2020-09-15T15:41:18Z) - Learning To Navigate The Synthetically Accessible Chemical Space Using
Reinforcement Learning [75.95376096628135]
We propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design.
In this setup, the agent learns to navigate through the immense synthetically accessible chemical space.
We describe how the end-to-end training in this study represents an important paradigm in radically expanding the synthesizable chemical space.
arXiv Detail & Related papers (2020-04-26T21:40:03Z) - The Synthesizability of Molecules Proposed by Generative Models [3.032184156362992]
Discovery of functional molecules is an expensive and time-consuming process.
One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization.
These techniques can suggest novel molecular structures intended to maximize a multi-objective function.
However, the utility of these approaches is stymied by ignorance of synthesizability.
arXiv Detail & Related papers (2020-02-17T15:41:28Z) - Improving Molecular Design by Stochastic Iterative Target Augmentation [38.44457632751997]
Generative models in molecular design tend to be richly parameterized, data-hungry neural models.
We propose a surprisingly effective self-training approach for iteratively creating additional molecular targets.
Our approach outperforms the previous state-of-the-art in conditional molecular design by over 10% in absolute gain.
arXiv Detail & Related papers (2020-02-11T22:40:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.