ChemSpaceAL: An Efficient Active Learning Methodology Applied to
Protein-Specific Molecular Generation
- URL: http://arxiv.org/abs/2309.05853v2
- Date: Mon, 4 Dec 2023 00:26:41 GMT
- Title: ChemSpaceAL: An Efficient Active Learning Methodology Applied to
Protein-Specific Molecular Generation
- Authors: Gregory W. Kyro, Anton Morgunov, Rafael I. Brent, Victor S. Batista
- Abstract summary: We present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data.
We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase.
Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The incredible capabilities of generative artificial intelligence models have
inevitably led to their application in the domain of drug discovery. Within
this domain, the vastness of chemical space motivates the development of more
efficient methods for identifying regions with molecules that exhibit desired
characteristics. In this work, we present a computationally efficient active
learning methodology that requires evaluation of only a subset of the generated
data in the constructed sample space to successfully align a generative model
with respect to a specified objective. We demonstrate the applicability of this
methodology to targeted molecular generation by fine-tuning a GPT-based
molecular generator toward a protein with FDA-approved small-molecule
inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules
similar to the inhibitors without prior knowledge of their existence, and even
reproduces two of them exactly. We also show that the methodology is effective
for a protein without any commercially available small-molecule inhibitors, the
HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that
the inherent generality of this method ensures that it will remain applicable
as the exciting field of in silico molecular generation evolves. To facilitate
implementation and reproducibility, we have made all of our software available
through the open-source ChemSpaceAL Python package.
Related papers
- Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation [0.0]
Mol-AIR is a reinforcement learning-based framework using adaptive intrinsic rewards for goal-directed molecular generation.
In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties.
arXiv Detail & Related papers (2024-03-29T10:44:51Z) - FREED++: Improving RL Agents for Fragment-Based Molecule Generation by
Thorough Reproduction [33.57089414199478]
Reinforcement Learning (RL) has emerged as a promising approach to generating molecules with the docking score (DS) as a reward.
We reproduce, scrutinize and improve the recent model for molecule generation called FREED (arXiv:2110.01219)
arXiv Detail & Related papers (2024-01-18T09:54:19Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - The Synthesizability of Molecules Proposed by Generative Models [3.032184156362992]
Discovery of functional molecules is an expensive and time-consuming process.
One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization.
These techniques can suggest novel molecular structures intended to maximize a multi-objective function.
However, the utility of these approaches is stymied by ignorance of synthesizability.
arXiv Detail & Related papers (2020-02-17T15:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.