Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets
- URL: http://arxiv.org/abs/2407.13780v1
- Date: Tue, 2 Jul 2024 16:01:37 GMT
- Title: Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets
- Authors: Ulrich A. Mbou Sob, Qiulin Li, Miguel ArbesĂș, Oliver Bent, Andries P. Smit, Arnu Pretorius,
- Abstract summary: We introduce a modification to SAFE to reduce the number of invalid fragmented molecules generated during training.
Our model can generate novel molecules with a validity rate > 90% and a fragmentation rate 1% by sampling from a latent space.
- Score: 4.047608146173188
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A specific challenge with deep learning approaches for molecule generation is generating both syntactically valid and chemically plausible molecular string representations. To address this, we propose a novel generative latent-variable transformer model for small molecules that leverages a recently proposed molecular string representation called SAFE. We introduce a modification to SAFE to reduce the number of invalid fragmented molecules generated during training and use this to train our model. Our experiments show that our model can generate novel molecules with a validity rate > 90% and a fragmentation rate < 1% by sampling from a latent space. By fine-tuning the model using reinforcement learning to improve molecular docking, we significantly increase the number of hit candidates for five specific protein targets compared to the pre-trained model, nearly doubling this number for certain targets. Additionally, our top 5% mean docking scores are comparable to the current state-of-the-art (SOTA), and we marginally outperform SOTA on three of the five targets.
Related papers
- Active Learning Enables Extrapolation in Molecular Generative Models [11.234291560423943]
We create an active-learning, closed-loop molecule generation pipeline for molecular generative models.
Compared against other generative model approaches, only our active learning approach generates molecules with properties that extrapolate beyond the training data.
The proportion of stable molecules generated is 3.5x higher than the next-best model.
arXiv Detail & Related papers (2025-01-03T19:07:06Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - FREED++: Improving RL Agents for Fragment-Based Molecule Generation by
Thorough Reproduction [33.57089414199478]
Reinforcement Learning (RL) has emerged as a promising approach to generating molecules with the docking score (DS) as a reward.
We reproduce, scrutinize and improve the recent model for molecule generation called FREED (arXiv:2110.01219)
arXiv Detail & Related papers (2024-01-18T09:54:19Z) - DiffDTM: A conditional structure-free framework for bioactive molecules
generation targeted for dual proteins [35.72694124335747]
DiffDTM is a conditional structure-free deep generative model based on a diffusion model for dual targets based molecule generation.
We have conducted comprehensive multi-view experiments to demonstrate that DiffDTM can generate drug-like, synthesis-accessible, novel, and high-binding affinity molecules.
The experimental results indicate that DiffDTM can be easily plugged into unseen dual targets to generate bioactive molecules.
arXiv Detail & Related papers (2023-06-24T13:08:55Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - SILVR: Guided Diffusion for Molecule Generation [0.0]
We introduce a machine-learning method for conditioning an existing generative model without retraining.
The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits.
We show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments.
arXiv Detail & Related papers (2023-04-21T11:47:38Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Fragment-based molecular generative model with high generalization
ability and synthetic accessibility [0.0]
We propose a fragment-based molecular generative model which designs new molecules with target properties.
A key feature of our model is a high generalization ability in terms of property control and fragment types.
We show that the model can generate molecules with the simultaneous control of multiple target properties at a high success rate.
arXiv Detail & Related papers (2021-11-25T04:44:37Z) - Hit and Lead Discovery with Explorative RL and Fragment-based Molecule
Generation [34.26748101294543]
We propose a novel framework that generates pharmacochemically acceptable molecules with large docking scores.
Our method constrains the generated molecules to a realistic and qualified chemical space and effectively explores the space to find drugs.
Our model produces molecules of higher quality compared to existing methods while achieving state-of-the-art performance on two of three targets.
arXiv Detail & Related papers (2021-10-04T07:21:00Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.