KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
- URL: http://arxiv.org/abs/2512.09365v1
- Date: Wed, 10 Dec 2025 06:55:36 GMT
- Title: KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
- Authors: Jiayu Qin, Zhengquan Luo, Guy Tadmor, Changyou Chen, David Zeevi, Zhiqiang Xu,
- Abstract summary: We develop an optimal transport-based approach to generate high-quality pseudo-labels for unlabeled molecule-protein pairs.<n>By treating pseudo-labeling as a mechanism for bridging disparate biological modalities, our approach enables the effective use of heterogeneous data.<n>We evaluate our framework on multiple MPI datasets including virtual screening tasks and protein retrieval tasks.
- Score: 20.031487150263725
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Predicting molecule-protein interactions (MPIs) is a fundamental task in computational biology, with crucial applications in drug discovery and molecular function annotation. However, existing MPI models face two major challenges. First, the scarcity of labeled molecule-protein pairs significantly limits model performance, as available datasets capture only a small fraction of biological relevant interactions. Second, most methods rely solely on molecular and protein features, ignoring broader biological context such as genes, metabolic pathways, and functional annotations that could provide essential complementary information. To address these limitations, our framework first aggregates diverse biological datasets, including molecular, protein, genes and pathway-level interactions, and then develop an optimal transport-based approach to generate high-quality pseudo-labels for unlabeled molecule-protein pairs, leveraging the underlying distribution of known interactions to guide label assignment. By treating pseudo-labeling as a mechanism for bridging disparate biological modalities, our approach enables the effective use of heterogeneous data to enhance MPI prediction. We evaluate our framework on multiple MPI datasets including virtual screening tasks and protein retrieval tasks, demonstrating substantial improvements over state-of-the-art methods in prediction accuracies and zero shot ability across unseen interactions. Beyond MPI prediction, our approach provides a new paradigm for leveraging diverse biological data sources to tackle problems traditionally constrained by single- or bi-modal learning, paving the way for future advances in computational biology and drug discovery.
Related papers
- Rep3Net: An Approach Exploiting Multimodal Representation for Molecular Bioactivity Prediction [0.8049701904919515]
In early stage drug discovery, bioactivity prediction of molecules against target proteins plays a crucial role.<n>We propose Rep3Net, a unified deep learning architecture that not only incorporates descriptor data but also includes spatial and relational information.<n>Our model employing multimodald features produce reliable bioactivity prediction on Poly [ADP-ribose] polymerase 1 dataset.
arXiv Detail & Related papers (2025-11-29T15:39:48Z) - Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling [74.25438319700929]
We propose CHMR (Cell-aware Hierarchical Multi-modal Representations), a robust framework that models local-global dependencies between molecules and cellular responses.<n> evaluated on nine public benchmarks spanning 728 tasks, CHMR outperforms state-of-the-art baselines.<n>Results demonstrate the advantage of hierarchy-aware, multimodal learning for reliable and biologically grounded molecular representations.
arXiv Detail & Related papers (2025-11-26T07:15:00Z) - Towards Interpretable Drug-Drug Interaction Prediction: A Graph-Based Approach with Molecular and Network-Level Explanations [3.6099926707292793]
Drug-drug interactions (DDIs) represent a critical challenge in pharmacology, often leading to adverse drug reactions with significant implications for patient safety and healthcare outcomes.<n>We propose MolecBioNet, a novel graph-based framework that integrates molecular and biomedical knowledge for robust and interpretable DDI prediction.
arXiv Detail & Related papers (2025-07-12T07:43:19Z) - KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction [60.23701115249195]
KEPLA is a novel deep learning framework that integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance.<n> Experiments on two benchmark datasets demonstrate that KEPLA consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-16T08:02:42Z) - Biological Pathway Guided Gene Selection Through Collaborative Reinforcement Learning [25.2831953927341]
We propose a novel framework that integrates statistical selection with biological pathway knowledge using multi-agent reinforcement learning (MARL)<n>Our framework incorporates pathway knowledge through Graph Neural Network-based state representations, a reward mechanism combining prediction performance with gene centrality and pathway coverage, and collaborative learning strategies using shared memory and a centralized critic component.
arXiv Detail & Related papers (2025-05-30T03:01:07Z) - A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery [32.573496601865465]
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein.<n>Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications.
arXiv Detail & Related papers (2025-03-06T12:04:56Z) - Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches [48.66541987908136]
Much work has been devoted to predicting binding affinity over the past decades.<n>We note growing use of both traditional machine learning and deep learning models for predicting binding affinity.<n>With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction.
arXiv Detail & Related papers (2024-09-30T03:40:49Z) - Learning to Denoise Biomedical Knowledge Graph for Robust Molecular Interaction Prediction [50.7901190642594]
We propose BioKDN (Biomedical Knowledge Graph Denoising Network) for robust molecular interaction prediction.
BioKDN refines the reliable structure of local subgraphs by denoising noisy links in a learnable manner.
It maintains consistent and robust semantics by smoothing relations around the target interaction.
arXiv Detail & Related papers (2023-12-09T07:08:00Z) - ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z) - Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge [5.893494985371817]
Large language models (LLMs) can identify genes/proteins associated with pathways of interest.<n>LLMs can identify genes/proteins associated with pathways of interest and predict their interactions to a certain extent.
arXiv Detail & Related papers (2023-07-17T20:01:11Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.