Machine learning on DNA-encoded libraries: A new paradigm for
hit-finding
- URL: http://arxiv.org/abs/2002.02530v1
- Date: Fri, 31 Jan 2020 19:31:23 GMT
- Title: Machine learning on DNA-encoded libraries: A new paradigm for
hit-finding
- Authors: Kevin McCloskey, Eric A. Sigel, Steven Kearnes, Ling Xue, Xia Tian,
Dennis Moccia, Diana Gikunju, Sana Bazzaz, Betty Chan, Matthew A. Clark, John
W. Cuozzo, Marie-Aude Gui\'e, John P. Guilinger, Christelle Huguet,
Christopher D. Hupp, Anthony D. Keefe, Christopher J. Mulhern, Ying Zhang,
and Patrick Riley
- Abstract summary: We demonstrate a new approach applying machine learning to DEL selection data.
We train models using only DEL selection data and apply automated or automatable filters.
The approach is effective, with an overall hit rate of sim30% at 30 textmuM and discovery of potent compounds (IC50 10 nM) for every target.
- Score: 4.473676566828977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DNA-encoded small molecule libraries (DELs) have enabled discovery of novel
inhibitors for many distinct protein targets of therapeutic value through
screening of libraries with up to billions of unique small molecules. We
demonstrate a new approach applying machine learning to DEL selection data by
identifying active molecules from a large commercial collection and a virtual
library of easily synthesizable compounds. We train models using only DEL
selection data and apply automated or automatable filters with chemist review
restricted to the removal of molecules with potential for instability or
reactivity. We validate this approach with a large prospective study (nearly
2000 compounds tested) across three diverse protein targets: sEH (a hydrolase),
ER{\alpha} (a nuclear receptor), and c-KIT (a kinase). The approach is
effective, with an overall hit rate of {\sim}30% at 30 {\textmu}M and discovery
of potent compounds (IC50 <10 nM) for every target. The model makes useful
predictions even for molecules dissimilar to the original DEL and the compounds
identified are diverse, predominantly drug-like, and different from known
ligands. Collectively, the quality and quantity of DEL selection data; the
power of modern machine learning methods; and access to large, inexpensive,
commercially-available libraries creates a powerful new approach for hit
finding.
Related papers
- KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors [2.0179908661487986]
We present KinDEL, one of the first large, publicly available DEL datasets on two kinases.
We benchmark different machine learning techniques to develop predictive models for hit identification.
We provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules.
arXiv Detail & Related papers (2024-10-11T16:03:58Z) - Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF)
We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions.
We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z) - RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions.
RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates.
We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z) - Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - Multi-objective Molecular Optimization for Opioid Use Disorder Treatment
Using Generative Network Complex [5.33208055504216]
Opioid Use Disorder (OUD) has emerged as a significant global health issue.
In this study, we propose a deep generative model that combines a differential equation (SDE)-based diffusion modeling with the latent space of a pretrained autoencoder model.
The molecular generator enables efficient generation of molecules that are effective on multiple targets.
arXiv Detail & Related papers (2023-06-13T01:12:31Z) - Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks [0.0]
We propose an end-to-end generative system, DrugGEN, for the de novo design of drug candidate molecules.
The system is trained using a large dataset of drug-like compounds and target-specific bioactive molecules.
Using the open-access DrugGEN, it is possible to easily train models for other druggable proteins.
arXiv Detail & Related papers (2023-02-15T18:59:27Z) - DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries [1.290382979353427]
We introduce a new paradigm, DEL-Dock, that combines ligand-based descriptors with 3-D spatial information from docked protein-ligand complexes.
We show that our model is capable of effectively denoising DEL count data to predict molecule enrichment scores.
arXiv Detail & Related papers (2022-11-30T22:00:24Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Partial Product Aware Machine Learning on DNA-Encoded Libraries [0.0]
Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the original DEL.
We leverage reaction yield data to enumerate the set of possible molecules corresponding to a given DNA tag.
arXiv Detail & Related papers (2022-05-16T23:18:02Z) - Neural networks for Anatomical Therapeutic Chemical (ATC) [83.73971067918333]
We propose combining multiple multi-label classifiers trained on distinct sets of features, including sets extracted from a Bidirectional Long Short-Term Memory Network (BiLSTM)
Experiments demonstrate the power of this approach, which is shown to outperform the best methods reported in the literature.
arXiv Detail & Related papers (2021-01-22T19:49:47Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.