Faster and more diverse de novo molecular optimization with double-loop
reinforcement learning using augmented SMILES
- URL: http://arxiv.org/abs/2210.12458v1
- Date: Sat, 22 Oct 2022 14:36:38 GMT
- Title: Faster and more diverse de novo molecular optimization with double-loop
reinforcement learning using augmented SMILES
- Authors: Esben Jannik Bjerrum, Christian Margreitter, Thomas Blaschke, Raquel
Lopez-Rios de Castro
- Abstract summary: We propose to use double-loop reinforcement learning with simplified molecular line entry system (SMILES) augmentation to use scoring calculations more efficiently.
We find that augmentation repeats between 5-10x seem safe for most scoring functions and additionally increase the diversity of the generated compounds.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecular generation via deep learning models in combination with
reinforcement learning is a powerful way of generating proposed molecules with
desirable properties. By defining a multi-objective scoring function, it is
possible to generate thousands of ideas for molecules that scores well, which
makes the approach interesting for drug discovery or material science purposes.
However, if the scoring function is expensive regarding resources, such as time
or computation, the high number of function evaluations needed for feedback in
the reinforcement learning loop becomes a bottleneck. Here we propose to use
double-loop reinforcement learning with simplified molecular line entry system
(SMILES) augmentation to use scoring calculations more efficiently and arrive
at well scoring molecules faster. By adding an inner loop where the SMILES
strings generated are augmented to alternative non-canonical SMILES and used
for additional rounds of reinforcement learning, we can effectively reuse the
scoring calculations that are done on the molecular level. This approach speeds
up the learning process regarding scoring function calls, as well as it
protects moderately against mode collapse. We find that augmentation repeats
between 5-10x seem safe for most scoring functions and additionally increase
the diversity of the generated compounds, as well as making the sampling runs
of chemical space more reproducible
Related papers
- Diversity-Aware Reinforcement Learning for de novo Drug Design [2.356290293311623]
Fine-tuning a pre-trained generative model has demonstrated good performance in generating promising drug molecules.
No study has examined how different adaptive update mechanisms for the reward function influence the diversity of generated molecules.
Our experiments reveal that combining structure- and prediction-based methods generally yields better results in terms of molecular diversity.
arXiv Detail & Related papers (2024-10-14T12:25:23Z) - Zero Shot Molecular Generation via Similarity Kernels [0.6597195879147557]
We present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation.
SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules.
We also release an interactive web tool that allows users to generate structures with SiMGen online.
arXiv Detail & Related papers (2024-02-13T17:53:44Z) - Attention Based Molecule Generation via Hierarchical Variational Autoencoder [0.0]
We show that by combining recurrent neural networks with convolutional networks in a hierarchical manner, we are able to both extract autoregressive information from SMILES strings.
This allows for generations with very high validity rates on the order of 95% when reconstructing known molecules.
arXiv Detail & Related papers (2024-01-18T21:45:12Z) - Utilizing Reinforcement Learning for de novo Drug Design [2.5740778707024305]
We develop a unified framework for using reinforcement learning for de novo drug design.
We study various on- and off-policy reinforcement learning algorithms and replay buffers to learn an RNN-based policy.
Our findings suggest that it is advantageous to use at least both top-scoring and low-scoring molecules for updating the policy.
arXiv Detail & Related papers (2023-03-30T07:40:50Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Improving RNA Secondary Structure Design using Deep Reinforcement
Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure.
We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.