BindCLIP: A Unified Contrastive-Generative Representation Learning Framework for Virtual Screening
- URL: http://arxiv.org/abs/2602.15236v1
- Date: Mon, 16 Feb 2026 22:26:55 GMT
- Title: BindCLIP: A Unified Contrastive-Generative Representation Learning Framework for Virtual Screening
- Authors: Anjie Qiao, Zhen Wang, Yaliang Li, Jiahua Rao, Yuedong Yang,
- Abstract summary: BindCLIP is a unified contrastive-generative representation learning framework for virtual screening.<n>We show BindCLIP achieves substantial gains on challenging out-of-distribution virtual screening.<n>Results indicate that integrating generative, pose-level supervision with contrastive learning yields more interaction-aware embeddings.
- Score: 46.26554693977487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual screening aims to efficiently identify active ligands from massive chemical libraries for a given target pocket. Recent CLIP-style models such as DrugCLIP enable scalable virtual screening by embedding pockets and ligands into a shared space. However, our analyses indicate that such representations can be insensitive to fine-grained binding interactions and may rely on shortcut correlations in training data, limiting their ability to rank ligands by true binding compatibility. To address these issues, we propose BindCLIP, a unified contrastive-generative representation learning framework for virtual screening. BindCLIP jointly trains pocket and ligand encoders using CLIP-style contrastive learning together with a pocket-conditioned diffusion objective for binding pose generation, so that pose-level supervision directly shapes the retrieval embedding space toward interaction-relevant features. To further mitigate shortcut reliance, we introduce hard-negative augmentation and a ligand-ligand anchoring regularizer that prevents representation collapse. Experiments on two public benchmarks demonstrate consistent improvements over strong baselines. BindCLIP achieves substantial gains on challenging out-of-distribution virtual screening and improves ligand-analogue ranking on the FEP+ benchmark. Together, these results indicate that integrating generative, pose-level supervision with contrastive learning yields more interaction-aware embeddings and improves generalization in realistic screening settings, bringing virtual screening closer to real-world applicability.
Related papers
- PromptCD: Test-Time Behavior Enhancement via Polarity-Prompt Contrastive Decoding [85.22047087898311]
We introduce Polarity-Prompt Contrastive Decoding (PromptCD), a test-time behavior control method that generalizes contrastive decoding to broader enhancement settings.<n>PromptCD constructs paired positive and negative guiding prompts for a target behavior and contrasts model responses to reinforce desirable outcomes.<n>Experiments on the "3H" alignment objectives demonstrate consistent and substantial improvements, indicating that post-trained models can achieve meaningful self-enhancement purely at test time.
arXiv Detail & Related papers (2026-02-24T08:56:52Z) - CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision [0.08699280339422537]
We propose CLIP-Joint-Detect, a framework that integrates CLIP-style contrastive vision-language supervision through end-to-end joint training.<n>A lightweight parallel head projects region or grid features into the CLIP embedding space and aligns them with learnable class-specific text embeddings via InfoNCE contrastive loss and an auxiliary cross-entropy term.<n>We validate it on Pascal VOC 2007+2012 using Faster R-CNN and on the large-scale MS 2017 benchmark using modern YOLO detectors (YOLOv11)
arXiv Detail & Related papers (2025-12-28T15:21:20Z) - Structure-Aware Contrastive Learning with Fine-Grained Binding Representations for Drug Discovery [3.1716746406651457]
This work introduces a sequence-based drug-target interaction framework that integrates structural priors into protein representations.<n>The model achieves state-of-the-art performance on Human and BioSNAP datasets and remains competitive on BindingDB.
arXiv Detail & Related papers (2025-09-18T09:38:46Z) - Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception [71.26728044621458]
DeCLIP is a novel framework that enhances CLIP by decoupling the self-attention module to obtain content'' and context'' features respectively.<n>It consistently achieves state-of-the-art performance across a broad spectrum of tasks, including 2D detection and segmentation, 3D instance segmentation, video instance segmentation, and 6D object pose estimation.
arXiv Detail & Related papers (2025-08-15T06:43:51Z) - CLIPin: A Non-contrastive Plug-in to CLIP for Multimodal Semantic Alignment [28.2773807732662]
Large-scale natural image-text datasets often suffer from loose semantic alignment due to weak supervision.<n>We propose CLIPin, a unified non-contrastive plug-in that can be seamlessly integrated into CLIP-style architectures.<n>Two shared robustness pre-projectors are designed for image and text modalities respectively to facilitate the integration of contrastive and non-contrastive learning.
arXiv Detail & Related papers (2025-08-08T16:23:05Z) - Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning [11.752632557524969]
Causal CLIP Adapter (CCA) is a novel framework that explicitly disentangles visual features extracted from CLIP.<n>Our method consistently outperforms state-of-the-art approaches in terms of few-shot performance and robustness to distributional shifts.
arXiv Detail & Related papers (2025-08-05T05:30:42Z) - Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative Filtering [16.02820746003461]
graph contrastive learning (GCL) has gradually become a dominant approach in recommender systems.<n>In this paper, we reveal via theoretical derivation that the gradient descent process of the CL objective is formally equivalent to graph convolution.<n>We propose a novel neighborhood aggregation objective to bring users closer to all interacted items while pushing them away from other positive pairs.
arXiv Detail & Related papers (2025-04-14T11:22:41Z) - Cross-Modal Consistency Learning for Sign Language Recognition [92.44927164283641]
Existing pre-training methods solely focus on the compact pose data.<n>We propose a Cross-modal Consistency Learning framework (CCL- SLR)<n>CCL- SLR learns from both RGB and pose modalities based on self-supervised pre-training.
arXiv Detail & Related papers (2025-03-16T12:34:07Z) - Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
We propose a novel framework that aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.<n>The core idea is to map users and items into discrete codes rich in collaborative information for reliable and informative contrastive view generation.
arXiv Detail & Related papers (2024-09-09T14:04:17Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP)
We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.