ToDD: Topological Compound Fingerprinting in Computer-Aided Drug
Discovery
- URL: http://arxiv.org/abs/2211.03808v1
- Date: Mon, 7 Nov 2022 19:00:05 GMT
- Title: ToDD: Topological Compound Fingerprinting in Computer-Aided Drug
Discovery
- Authors: Andac Demir, Baris Coskunuzer, Ignacio Segovia-Dominguez, Yuzhou Chen,
Yulia Gel, Bulent Kiziltan
- Abstract summary: In computer-aided drug discovery (CADD), virtual screening is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds.
To address this problem, we developed a novel method using multi parameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors.
We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates.
- Score: 8.620443111346523
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In computer-aided drug discovery (CADD), virtual screening (VS) is used for
identifying the drug candidates that are most likely to bind to a molecular
target in a large library of compounds. Most VS methods to date have focused on
using canonical compound representations (e.g., SMILES strings, Morgan
fingerprints) or generating alternative fingerprints of the compounds by
training progressively more complex variational autoencoders (VAEs) and graph
neural networks (GNNs). Although VAEs and GNNs led to significant improvements
in VS performance, these methods suffer from reduced performance when scaling
to large virtual compound datasets. The performance of these methods has shown
only incremental improvements in the past few years. To address this problem,
we developed a novel method using multiparameter persistence (MP) homology that
produces topological fingerprints of the compounds as multidimensional vectors.
Our primary contribution is framing the VS process as a new topology-based
graph ranking problem by partitioning a compound into chemical substructures
informed by the periodic properties of its atoms and extracting their
persistent homology features at multiple resolution levels. We show that the
margin loss fine-tuning of pretrained Triplet networks attains highly
competitive results in differentiating between compounds in the embedding space
and ranking their likelihood of becoming effective drug candidates. We further
establish theoretical guarantees for the stability properties of our proposed
MP signatures, and demonstrate that our models, enhanced by the MP signatures,
outperform state-of-the-art methods on benchmark datasets by a wide and highly
statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain
for DUD-E Diverse dataset).
Related papers
- Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference [53.38082028252104]
We introduce HG2M and HG2M+ to combine both HGNN's superior performance and relational's efficient inference.
HG2M directly trains students with node features as input and soft labels from teacher HGNNs as targets.
HG2Ms demonstrate a 379.24$times$ speedup in inference over HGNNs on the large-scale IGB-3M-19 dataset.
arXiv Detail & Related papers (2024-11-21T11:39:09Z) - Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF)
We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions.
We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - ADMET property prediction through combinations of molecular fingerprints [0.0]
Random forests or support vector machines paired with extended-connectivity fingerprints consistently outperformed recently developed methods.
A detailed investigation into regression algorithms and molecular fingerprints revealed gradient-boosted decision trees.
We successfully validated our model across 22 Therapeutics Data Commons ADMET benchmarks.
arXiv Detail & Related papers (2023-09-29T22:39:18Z) - Boosting Convolution with Efficient MLP-Permutation for Volumetric
Medical Image Segmentation [32.645022002807416]
Multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT.
We propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and PHNet.
arXiv Detail & Related papers (2023-03-23T08:59:09Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Pharmacoprint -- a combination of pharmacophore fingerprint and
artificial intelligence as a tool for computer-aided drug design [6.053347262128918]
We propose a high-resolution, pharmacophore fingerprint called Pharmacoprint.
It encodes the presence, types, and relationships between pharmacophore features of a molecule.
arXiv Detail & Related papers (2021-10-04T11:36:39Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - DeepGS: Deep Representation Learning of Graphs and Sequences for
Drug-Target Binding Affinity Prediction [8.292330541203647]
We propose a novel end-to-end learning framework, called DeepGS, which uses deep neural networks to extract the local chemical context from amino acids and SMILES sequences.
We have conducted extensive experiments to compare our proposed method with state-of-the-art models including KronRLS, Sim, DeepDTA and DeepCPI.
arXiv Detail & Related papers (2020-03-31T01:35:39Z) - Adversarial Feature Hallucination Networks for Few-Shot Learning [84.31660118264514]
Adversarial Feature Hallucination Networks (AFHN) is based on conditional Wasserstein Generative Adversarial networks (cWGAN)
Two novel regularizers are incorporated into AFHN to encourage discriminability and diversity of the synthesized features.
arXiv Detail & Related papers (2020-03-30T02:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.