Learning Protein-Ligand Binding in Hyperbolic Space
- URL: http://arxiv.org/abs/2508.15480v1
- Date: Thu, 21 Aug 2025 11:56:25 GMT
- Title: Learning Protein-Ligand Binding in Hyperbolic Space
- Authors: Jianhui Wang, Wenyu Zhu, Bowen Gao, Xin Hong, Ya-Qin Zhang, Wei-Ying Ma, Yanyan Lan,
- Abstract summary: HypSeek is a hyperbolic representation learning framework that embeds protein pockets and sequences into Lorentz-model hyperbolic space.<n>By leveraging the exponential geometry and negative curvature of hyperbolic space, HypSeek enables expressive, affinity-sensitive embeddings.<n>Our mode unifies virtual screening and affinity ranking in a single framework, introducing a protein-guided three-tower architecture.
- Score: 21.421085367060343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protein-ligand binding prediction is central to virtual screening and affinity ranking, two fundamental tasks in drug discovery. While recent retrieval-based methods embed ligands and protein pockets into Euclidean space for similarity-based search, the geometry of Euclidean embeddings often fails to capture the hierarchical structure and fine-grained affinity variations intrinsic to molecular interactions. In this work, we propose HypSeek, a hyperbolic representation learning framework that embeds ligands, protein pockets, and sequences into Lorentz-model hyperbolic space. By leveraging the exponential geometry and negative curvature of hyperbolic space, HypSeek enables expressive, affinity-sensitive embeddings that can effectively model both global activity and subtle functional differences-particularly in challenging cases such as activity cliffs, where structurally similar ligands exhibit large affinity gaps. Our mode unifies virtual screening and affinity ranking in a single framework, introducing a protein-guided three-tower architecture to enhance representational structure. HypSeek improves early enrichment in virtual screening on DUD-E from 42.63 to 51.44 (+20.7%) and affinity ranking correlation on JACS from 0.5774 to 0.7239 (+25.4%), demonstrating the benefits of hyperbolic geometry across both tasks and highlighting its potential as a powerful inductive bias for protein-ligand modeling.
Related papers
- Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles [74.32932832937618]
We introduce $textbfRigidSSL$ ($textitRigidity-Aware Self-Supervised Learning$), a geometric pretraining framework.<n>Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures from the AlphaFold Protein Structure Database with simulated perturbations.<n>Phase II (RigidSSL-MD) refines these representations on 1.3K molecular dynamics trajectories to capture physically realistic transitions.
arXiv Detail & Related papers (2026-03-02T21:32:30Z) - HypRAG: Hyperbolic Dense Retrieval for Retrieval Augmented Generation [30.711145880382286]
HyTE-FH and HyTE-H are hyperbolic representations projecting pre-trained Euclidean embeddings into hyperbolic space.<n>The Outward Einstein Midpoint is a geometry-aware pooling operator that provably preserves hierarchical structure.<n>Our analysis also reveals that hyperbolic representations encode document specificity through norm-based separation.
arXiv Detail & Related papers (2026-02-08T00:18:05Z) - LumiX: Structured and Coherent Text-to-Intrinsic Generation [56.659456254026985]
We present LumiX, a structured diffusion framework for coherent text-to-intrinsic generation.<n>LumiX produces coherent and physically meaningful results, achieving 23% higher alignment and a better preference score.<n>It can also perform image-conditioned decomposition within the same framework.
arXiv Detail & Related papers (2025-12-02T13:56:02Z) - S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening [72.89086338778098]
We propose a two-stage framework for protein-ligand contrastive representation learning.<n>In the first stage, we perform protein sequence pretraining on ChemBL using an ESM2-based backbone.<n>In the second stage, we fine-tune on PDBBind by fusing sequence and structure information through a residue-level gating module.<n>This auxiliary task guides the model to accurately localize binding residues within the protein sequence and capture their 3D spatial arrangement.
arXiv Detail & Related papers (2025-11-10T11:57:47Z) - scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration [53.683726781791385]
We introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration.<n>Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation.
arXiv Detail & Related papers (2025-10-28T21:28:39Z) - A Novel Framework for Multi-Modal Protein Representation Learning [13.33566214386641]
We propose Diffused and Aligned Multi-modal Protein Embedding (DAMPE), a unified framework that addresses two core mechanisms.<n>First, we propose Optimal Transport (OT)-based representation alignment that establishes correspondence between intrinsic embedding spaces of different modalities.<n>Second, we develop a Conditional Graph Generation (CGG)-based information fusion method, where a condition encoder fuses the aligned intrinsic embeddings to provide informative cues for graph reconstruction.
arXiv Detail & Related papers (2025-10-27T12:33:01Z) - DOOMGAN:High-Fidelity Dynamic Identity Obfuscation Ocular Generative Morphing [3.9440964696313485]
Ocular biometrics in the visible spectrum have emerged as a prominent modality due to high accuracy, resistance to spoofing, and non-invasive nature.<n> morphing attacks, synthetic biometric traits created by blending features from multiple individuals, threaten biometric system integrity.
arXiv Detail & Related papers (2025-07-23T02:43:49Z) - KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction [60.23701115249195]
KEPLA is a novel deep learning framework that integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance.<n> Experiments on two benchmark datasets demonstrate that KEPLA consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-16T08:02:42Z) - Bidirectional Hierarchical Protein Multi-Modal Representation Learning [4.682021474006426]
Protein language models (pLMs) pretrained on large scale protein sequences have demonstrated significant success in sequence-based tasks.<n> graph neural networks (GNNs) designed to leverage 3D structural information have shown promising generalization in protein-related prediction tasks.<n>We propose a bidirectional and hierarchical (Bi-Hierarchical) fusion approach to capture richer and more comprehensive protein representations.
arXiv Detail & Related papers (2025-04-07T06:47:49Z) - GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - Hyperbolic Image-and-Pointcloud Contrastive Learning for 3D Classification [14.439996427728483]
We propose a hyperbolic image-and-pointcloud contrastive learning method (HyperIPC)
For the intra-modal branch, we rely on the intrinsic geometric structure to explore the hyperbolic embedding representation of point cloud.
For the cross-modal branch, we leverage images to guide the point cloud in establishing strong semantic hierarchical correlations.
arXiv Detail & Related papers (2024-09-24T07:13:37Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities.
It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset.
In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv Detail & Related papers (2022-08-19T14:55:12Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.