Hashing based Contrastive Learning for Virtual Screening
- URL: http://arxiv.org/abs/2407.19790v1
- Date: Mon, 29 Jul 2024 08:33:49 GMT
- Title: Hashing based Contrastive Learning for Virtual Screening
- Authors: Jin Han, Yun Hong, Wu-Jun Li,
- Abstract summary: We propose a hashing-based contrastive learning method, called DrugHash, for virtual screening (VS)
DrugHash treats VS as a retrieval task that uses efficient binary hash codes for retrieval.
Experimental results show that DrugHash can outperform existing methods to achieve state-of-the-art accuracy.
- Score: 17.21872872618248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual screening (VS) is a critical step in computer-aided drug discovery, aiming to identify molecules that bind to a specific target receptor like protein. Traditional VS methods, such as docking, are often too time-consuming for screening large-scale molecular databases. Recent advances in deep learning have demonstrated that learning vector representations for both proteins and molecules using contrastive learning can outperform traditional docking methods. However, given that target databases often contain billions of molecules, real-valued vector representations adopted by existing methods can still incur significant memory and time costs in VS. To address this problem, in this paper we propose a hashing-based contrastive learning method, called DrugHash, for VS. DrugHash treats VS as a retrieval task that uses efficient binary hash codes for retrieval. In particular, DrugHash designs a simple yet effective hashing strategy to enable end-to-end learning of binary hash codes for both protein and molecule modalities, which can dramatically reduce the memory and time costs with higher accuracy compared with existing methods. Experimental results show that DrugHash can outperform existing methods to achieve state-of-the-art accuracy, with a memory saving of 32$\times$ and a speed improvement of 3.5$\times$.
Related papers
- Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised
Semantic Hashing [71.47723696190184]
We propose an innovative Bit-mask Robust Contrastive knowledge Distillation (BRCD) method for semantic hashing.
BRCD is specifically devised for the distillation of semantic hashing models.
arXiv Detail & Related papers (2024-03-10T03:33:59Z) - DrugCLIP: Contrastive Protein-Molecule Representation Learning for
Virtual Screening [16.31607535765497]
DrugCLIP is a novel contrastive learning framework for virtual screening.
It can align representations of binding protein pockets and molecules from a large quantity of pairwise data without explicit binding-affinity scores.
It significantly outperforms traditional docking and supervised learning methods on diverse virtual screening benchmarks.
arXiv Detail & Related papers (2023-10-10T07:08:35Z) - A Lower Bound of Hash Codes' Performance [122.88252443695492]
In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes' performance.
We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it, which results in a low-bias optimization.
By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26.5%$ increase in mean Average Precision and an up to $20.5%$ increase in accuracy.
arXiv Detail & Related papers (2022-10-12T03:30:56Z) - Self-Distilled Hashing for Deep Image Retrieval [25.645550298697938]
In hash-based image retrieval systems, transformed input from the original usually generates different codes.
We propose a novel self-distilled hashing scheme to minimize the discrepancy while exploiting the potential of augmented data.
We also introduce hash proxy-based similarity learning and binary cross entropy-based quantization loss to provide fine quality hash codes.
arXiv Detail & Related papers (2021-12-16T12:01:50Z) - One Loss for All: Deep Hashing with a Single Cosine Similarity based
Learning Objective [86.48094395282546]
A deep hashing model typically has two main learning objectives: to make the learned binary hash codes discriminative and to minimize a quantization error.
We propose a novel deep hashing model with only a single learning objective.
Our model is highly effective, outperforming the state-of-the-art multi-loss hashing models on three large-scale instance retrieval benchmarks.
arXiv Detail & Related papers (2021-09-29T14:27:51Z) - MOON: Multi-Hash Codes Joint Learning for Cross-Media Retrieval [30.77157852327981]
Cross-media hashing technique has attracted increasing attention for its high computation efficiency and low storage cost.
We develop a novel Multiple hash cOdes jOint learNing method (MOON) for cross-media retrieval.
arXiv Detail & Related papers (2021-08-17T14:47:47Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z) - Procrustean Orthogonal Sparse Hashing [3.302605292858623]
We show that insect olfaction is structurally and functionally analogous to sparse hashing.
We present a novel method, Procrustean Orthogonal Sparse Hashing (POSH), that unifies these findings.
We propose two new methods, Binary OSL and SphericalHash, to address these deficiencies.
arXiv Detail & Related papers (2020-06-08T18:09:33Z) - Targeted Attack for Deep Hashing based Retrieval [57.582221494035856]
We propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval.
We first formulate the targeted attack as a point-to-set optimization, which minimizes the average distance between the hash code of an adversarial example and those of a set of objects with the target label.
To balance the performance and perceptibility, we propose to minimize the Hamming distance between the hash code of the adversarial example and the anchor code under the $ellinfty$ restriction on the perturbation.
arXiv Detail & Related papers (2020-04-15T08:36:58Z) - A Survey on Deep Hashing Methods [52.326472103233854]
Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries.
With the development of deep learning, deep hashing methods show more advantages than traditional methods.
Deep supervised hashing is categorized into pairwise methods, ranking-based methods, pointwise methods and quantization.
Deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods.
arXiv Detail & Related papers (2020-03-04T08:25:15Z) - Image Hashing by Minimizing Discrete Component-wise Wasserstein Distance [12.968141477410597]
Adversarial autoencoders are shown to be able to implicitly learn a robust, locality-preserving hash function that generates balanced and high-quality hash codes.
The existing adversarial hashing methods are inefficient to be employed for large-scale image retrieval applications.
We propose a new adversarial-autoencoder hashing approach that has a much lower sample requirement and computational cost.
arXiv Detail & Related papers (2020-02-29T00:22:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.