DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries
- URL: http://arxiv.org/abs/2212.00136v1
- Date: Wed, 30 Nov 2022 22:00:24 GMT
- Title: DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries
- Authors: Kirill Shmilovich, Benson Chen, Theofanis Karaletos, Mohammad M.
Sultan
- Abstract summary: We introduce a new paradigm, DEL-Dock, that combines ligand-based descriptors with 3-D spatial information from docked protein-ligand complexes.
We show that our model is capable of effectively denoising DEL count data to predict molecule enrichment scores.
- Score: 1.290382979353427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DNA-Encoded Library (DEL) technology has enabled significant advances in hit
identification by enabling efficient testing of combinatorially-generated
molecular libraries. DEL screens measure protein binding affinity though
sequencing reads of molecules tagged with unique DNA-barcodes that survive a
series of selection experiments. Computational models have been deployed to
learn the latent binding affinities that are correlated to the sequenced count
data; however, this correlation is often obfuscated by various sources of noise
introduced in its complicated data-generation process. In order to denoise DEL
count data and screen for molecules with good binding affinity, computational
models require the correct assumptions in their modeling structure to capture
the correct signals underlying the data. Recent advances in DEL models have
focused on probabilistic formulations of count data, but existing approaches
have thus far been limited to only utilizing 2-D molecule-level
representations. We introduce a new paradigm, DEL-Dock, that combines
ligand-based descriptors with 3-D spatial information from docked
protein-ligand complexes. 3-D spatial information allows our model to learn
over the actual binding modality rather than using only structured-based
information of the ligand. We show that our model is capable of effectively
denoising DEL count data to predict molecule enrichment scores that are better
correlated with experimental binding affinity measurements compared to prior
works. Moreover, by learning over a collection of docked poses we demonstrate
that our model, trained only on DEL data, implicitly learns to perform good
docking pose selection without requiring external supervision from
expensive-to-source protein crystal structures.
Related papers
- Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs.
Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection.
Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z) - Data Augmentation Scheme for Raman Spectra with Highly Correlated
Annotations [0.23090185577016453]
We exploit the additive nature of spectra in order to generate additional data points from a given dataset that have statistically independent labels.
We show that training a CNN on these generated data points improves the performance on datasets where the annotations do not bear the same correlation as the dataset that was used for model training.
arXiv Detail & Related papers (2024-02-01T18:46:28Z) - Integrating Chemical Language and Molecular Graph in Multimodal Fused
Deep Learning for Drug Property Prediction [9.948710779498487]
We construct multimodal deep learning models to cover different molecular representations.
Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise.
arXiv Detail & Related papers (2023-12-29T07:19:42Z) - Compositional Deep Probabilistic Models of DNA Encoded Libraries [6.206196935093064]
We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks.
Our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure.
arXiv Detail & Related papers (2023-10-20T19:04:28Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction
with a Graph-Based Deep Learning Model [33.92165575735532]
Accurate prediction of the protein-ligand binding affinities is an essential challenge in the structure-based drug design.
Here, we curated an MD dataset containing 3,218 different protein-ligand complexes, and developed Dynaformer, a graph-based deep learning model.
Dynaformer was able to accurately predict the binding affinities by learning the geometric characteristics of the protein-ligand interactions from the MD trajectories.
arXiv Detail & Related papers (2022-08-19T14:55:12Z) - Machine learning on DNA-encoded library count data using an
uncertainty-aware probabilistic loss function [1.5559232742666467]
We show a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function.
We illustrate this approach on a dataset of 108k compounds screened against CAIX, and a dataset of 5.7M compounds screened against sEH and SIRT2.
arXiv Detail & Related papers (2021-08-27T19:37:06Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.