DrugCLIP: Contrastive Protein-Molecule Representation Learning for
Virtual Screening
- URL: http://arxiv.org/abs/2310.06367v1
- Date: Tue, 10 Oct 2023 07:08:35 GMT
- Title: DrugCLIP: Contrastive Protein-Molecule Representation Learning for
Virtual Screening
- Authors: Bowen Gao, Bo Qiang, Haichuan Tan, Minsi Ren, Yinjun Jia, Minsi Lu,
Jingjing Liu, Weiying Ma, Yanyan Lan
- Abstract summary: DrugCLIP is a novel contrastive learning framework for virtual screening.
It can align representations of binding protein pockets and molecules from a large quantity of pairwise data without explicit binding-affinity scores.
It significantly outperforms traditional docking and supervised learning methods on diverse virtual screening benchmarks.
- Score: 16.31607535765497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual screening, which identifies potential drugs from vast compound
databases to bind with a particular protein pocket, is a critical step in
AI-assisted drug discovery. Traditional docking methods are highly
time-consuming, and can only work with a restricted search library in real-life
applications. Recent supervised learning approaches using scoring functions for
binding-affinity prediction, although promising, have not yet surpassed docking
methods due to their strong dependency on limited data with reliable
binding-affinity labels. In this paper, we propose a novel contrastive learning
framework, DrugCLIP, by reformulating virtual screening as a dense retrieval
task and employing contrastive learning to align representations of binding
protein pockets and molecules from a large quantity of pairwise data without
explicit binding-affinity scores. We also introduce a biological-knowledge
inspired data augmentation strategy to learn better protein-molecule
representations. Extensive experiments show that DrugCLIP significantly
outperforms traditional docking and supervised learning methods on diverse
virtual screening benchmarks with highly reduced computation time, especially
in zero-shot setting.
Related papers
- S-MolSearch: 3D Semi-supervised Contrastive Learning for Bioactive Molecule Search [30.071862398889774]
We propose S-MolSearch, the first framework to leverage molecular 3D information and affinity information in contrastive learning for virtual screening.
S-MolSearch efficiently processes both labeled and unlabeled data, training molecular structural encoders while generating soft labels for the unlabeled data.
It surpasses both structure-based and ligand-based virtual screening methods for enrichment factors across 0.5%, 1% and 5%.
arXiv Detail & Related papers (2024-08-27T14:51:11Z) - Hashing based Contrastive Learning for Virtual Screening [17.21872872618248]
We propose a hashing-based contrastive learning method, called DrugHash, for virtual screening (VS)
DrugHash treats VS as a retrieval task that uses efficient binary hash codes for retrieval.
Experimental results show that DrugHash can outperform existing methods to achieve state-of-the-art accuracy.
arXiv Detail & Related papers (2024-07-29T08:33:49Z) - Understanding active learning of molecular docking and its applications [0.6554326244334868]
We investigate how active learning methodologies effectively predict docking scores using only 2D structures.
Our findings suggest that surrogate models tend to memorize structural patterns prevalent in high docking scored compounds.
Our comprehensive analysis underscores the reliability and potential applicability of active learning methodologies in virtual screening campaigns.
arXiv Detail & Related papers (2024-06-14T05:43:42Z) - ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide
Sequencing [70.12220342151113]
ContraNovo is a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides.
ContraNovo consistently outshines contemporary state-of-the-art solutions.
arXiv Detail & Related papers (2023-12-18T12:49:46Z) - PIGNet2: A Versatile Deep Learning-based Protein-Ligand Interaction
Prediction Model for Binding Affinity Scoring and Virtual Screening [0.0]
Prediction of protein-ligand interactions (PLI) plays a crucial role in drug discovery.
The development of a versatile model capable of accurately scoring binding affinity and conducting efficient virtual screening remains a challenge.
Here, we propose a viable solution by introducing a novel data augmentation strategy combined with a physics-informed graph neural network.
arXiv Detail & Related papers (2023-07-03T14:46:49Z) - HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised
Learning of Actions [69.14257241250046]
We propose a new contrastive learning approach to train models for skeleton-based action recognition without labels.
Our key contribution is a simple module, HaLP - to Hallucinate Latent Positives for contrastive learning.
We show via experiments that using these generated positives within a standard contrastive learning framework leads to consistent improvements.
arXiv Detail & Related papers (2023-04-01T21:09:43Z) - Few-Shot Learning for Biometric Verification [2.3226893628361682]
In machine learning applications, it is common practice to feed as much information as possible. In most cases, the model can handle large data sets that allow to predict more accurately.
We propose a novel end-to-end lightweight architecture that verifies biometric data by producing competitive results as compared to state-of-the-art accuracies through Few-Shot learning methods.
arXiv Detail & Related papers (2022-11-12T22:49:25Z) - SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity
Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery.
wet experiments remain the most reliable method, but they are time-consuming and resource-intensive.
Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue.
We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z) - Federated Cycling (FedCy): Semi-supervised Federated Learning of
Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos.
We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.