Related papers: DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening

DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening

URL: http://arxiv.org/abs/2310.06367v1
Date: Tue, 10 Oct 2023 07:08:35 GMT
Title: DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening
Authors: Bowen Gao, Bo Qiang, Haichuan Tan, Minsi Ren, Yinjun Jia, Minsi Lu, Jingjing Liu, Weiying Ma, Yanyan Lan
Abstract summary: DrugCLIP is a novel contrastive learning framework for virtual screening. It can align representations of binding protein pockets and molecules from a large quantity of pairwise data without explicit binding-affinity scores. It significantly outperforms traditional docking and supervised learning methods on diverse virtual screening benchmarks.
Score: 16.31607535765497
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Virtual screening, which identifies potential drugs from vast compound databases to bind with a particular protein pocket, is a critical step in AI-assisted drug discovery. Traditional docking methods are highly time-consuming, and can only work with a restricted search library in real-life applications. Recent supervised learning approaches using scoring functions for binding-affinity prediction, although promising, have not yet surpassed docking methods due to their strong dependency on limited data with reliable binding-affinity labels. In this paper, we propose a novel contrastive learning framework, DrugCLIP, by reformulating virtual screening as a dense retrieval task and employing contrastive learning to align representations of binding protein pockets and molecules from a large quantity of pairwise data without explicit binding-affinity scores. We also introduce a biological-knowledge inspired data augmentation strategy to learn better protein-molecule representations. Extensive experiments show that DrugCLIP significantly outperforms traditional docking and supervised learning methods on diverse virtual screening benchmarks with highly reduced computation time, especially in zero-shot setting.

Related papers

S-MolSearch: 3D Semi-supervised Contrastive Learning for Bioactive Molecule Search [30.071862398889774]
We propose S-MolSearch, the first framework to leverage molecular 3D information and affinity information in semi-supervised contrastive learning. It efficiently processes both labeled and unlabeled data, training molecular structural encoders while generating soft labels for the unlabeled data. Empirically, S-MolSearch demonstrates superior performance on widely-used benchmarks LIT-PCBA and DUD-E.
arXiv Detail & Related papers (2024-08-27T14:51:11Z)
Hashing based Contrastive Learning for Virtual Screening [17.21872872618248]
We propose a hashing-based contrastive learning method, called DrugHash, for virtual screening (VS) DrugHash treats VS as a retrieval task that uses efficient binary hash codes for retrieval. Experimental results show that DrugHash can outperform existing methods to achieve state-of-the-art accuracy.
arXiv Detail & Related papers (2024-07-29T08:33:49Z)
Understanding active learning of molecular docking and its applications [0.6554326244334868]
We investigate how active learning methodologies effectively predict docking scores using only 2D structures. Our findings suggest that surrogate models tend to memorize structural patterns prevalent in high docking scored compounds. Our comprehensive analysis underscores the reliability and potential applicability of active learning methodologies in virtual screening campaigns.
arXiv Detail & Related papers (2024-06-14T05:43:42Z)
ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing [70.12220342151113]
ContraNovo is a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides. ContraNovo consistently outshines contemporary state-of-the-art solutions.
arXiv Detail & Related papers (2023-12-18T12:49:46Z)
PIGNet2: A Versatile Deep Learning-based Protein-Ligand Interaction Prediction Model for Binding Affinity Scoring and Virtual Screening [0.0]
Prediction of protein-ligand interactions (PLI) plays a crucial role in drug discovery. The development of a versatile model capable of accurately scoring binding affinity and conducting efficient virtual screening remains a challenge. Here, we propose a viable solution by introducing a novel data augmentation strategy combined with a physics-informed graph neural network.
arXiv Detail & Related papers (2023-07-03T14:46:49Z)
HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions [69.14257241250046]
We propose a new contrastive learning approach to train models for skeleton-based action recognition without labels. Our key contribution is a simple module, HaLP - to Hallucinate Latent Positives for contrastive learning. We show via experiments that using these generated positives within a standard contrastive learning framework leads to consistent improvements.
arXiv Detail & Related papers (2023-04-01T21:09:43Z)
Few-Shot Learning for Biometric Verification [2.3226893628361682]
In machine learning applications, it is common practice to feed as much information as possible. In most cases, the model can handle large data sets that allow to predict more accurately. We propose a novel end-to-end lightweight architecture that verifies biometric data by producing competitive results as compared to state-of-the-art accuracies through Few-Shot learning methods.
arXiv Detail & Related papers (2022-11-12T22:49:25Z)
SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery. wet experiments remain the most reliable method, but they are time-consuming and resource-intensive. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z)
Federated Cycling (FedCy): Semi-supervised Federated Learning of Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos. We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features. Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets. We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.