DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries
- URL: http://arxiv.org/abs/2212.00136v1
- Date: Wed, 30 Nov 2022 22:00:24 GMT
- Title: DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries
- Authors: Kirill Shmilovich, Benson Chen, Theofanis Karaletos, Mohammad M.
Sultan
- Abstract summary: We introduce a new paradigm, DEL-Dock, that combines ligand-based descriptors with 3-D spatial information from docked protein-ligand complexes.
We show that our model is capable of effectively denoising DEL count data to predict molecule enrichment scores.
- Score: 1.290382979353427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DNA-Encoded Library (DEL) technology has enabled significant advances in hit
identification by enabling efficient testing of combinatorially-generated
molecular libraries. DEL screens measure protein binding affinity though
sequencing reads of molecules tagged with unique DNA-barcodes that survive a
series of selection experiments. Computational models have been deployed to
learn the latent binding affinities that are correlated to the sequenced count
data; however, this correlation is often obfuscated by various sources of noise
introduced in its complicated data-generation process. In order to denoise DEL
count data and screen for molecules with good binding affinity, computational
models require the correct assumptions in their modeling structure to capture
the correct signals underlying the data. Recent advances in DEL models have
focused on probabilistic formulations of count data, but existing approaches
have thus far been limited to only utilizing 2-D molecule-level
representations. We introduce a new paradigm, DEL-Dock, that combines
ligand-based descriptors with 3-D spatial information from docked
protein-ligand complexes. 3-D spatial information allows our model to learn
over the actual binding modality rather than using only structured-based
information of the ligand. We show that our model is capable of effectively
denoising DEL count data to predict molecule enrichment scores that are better
correlated with experimental binding affinity measurements compared to prior
works. Moreover, by learning over a collection of docked poses we demonstrate
that our model, trained only on DEL data, implicitly learns to perform good
docking pose selection without requiring external supervision from
expensive-to-source protein crystal structures.
Related papers
- BAPULM: Binding Affinity Prediction using Language Models [7.136205674624813]
We introduce BAPULM, an innovative sequence-based framework that leverages the chemical latent representations of proteins via ProtT5-XL-U50 and through MolFormer.
Our approach was validated extensively on benchmark datasets, achieving sequential scoring power (R) values of 0.925 $pm$ 0.043, 0.914 $pm$ 0.004, and 0.8132 $pm$ 0.001 on benchmark1k2101, Test2016_290, and CSAR-HiQ_36, respectively.
arXiv Detail & Related papers (2024-11-06T04:35:30Z) - KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors [2.0179908661487986]
We present KinDEL, one of the first large, publicly available DEL datasets on two kinases.
We benchmark different machine learning techniques to develop predictive models for hit identification.
We provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules.
arXiv Detail & Related papers (2024-10-11T16:03:58Z) - Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF)
We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions.
We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs.
Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection.
Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z) - Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction [9.388979080270103]
We construct multimodal deep learning models to cover different molecular representations.
Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise.
arXiv Detail & Related papers (2023-12-29T07:19:42Z) - Compositional Deep Probabilistic Models of DNA Encoded Libraries [6.206196935093064]
We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks.
Our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure.
arXiv Detail & Related papers (2023-10-20T19:04:28Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Machine learning on DNA-encoded library count data using an
uncertainty-aware probabilistic loss function [1.5559232742666467]
We show a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function.
We illustrate this approach on a dataset of 108k compounds screened against CAIX, and a dataset of 5.7M compounds screened against sEH and SIRT2.
arXiv Detail & Related papers (2021-08-27T19:37:06Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.