GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration
- URL: http://arxiv.org/abs/2501.16382v1
- Date: Fri, 24 Jan 2025 18:16:53 GMT
- Title: GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration
- Authors: Ziwen Li, Xiang 'Anthony' Chen, Youngseung Jeon,
- Abstract summary: Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks have accelerated drug discovery.<n>GraPPI is a large-scale knowledge graph (KG)-based retrieve-divide-solve agent pipeline RAG framework to support large-scale PPI signaling pathway exploration.
- Score: 13.390039857939168
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Drug discovery (DD) has tremendously contributed to maintaining and improving public health. Hypothesizing that inhibiting protein misfolding can slow disease progression, researchers focus on target identification (Target ID) to find protein structures for drug binding. While Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks have accelerated drug discovery, integrating models into cohesive workflows remains challenging. We conducted a user study with drug discovery researchers to identify the applicability of LLMs and RAGs in Target ID. We identified two main findings: 1) an LLM should provide multiple Protein-Protein Interactions (PPIs) based on an initial protein and protein candidates that have a therapeutic impact; 2) the model must provide the PPI and relevant explanations for better understanding. Based on these observations, we identified three limitations in previous approaches for Target ID: 1) semantic ambiguity, 2) lack of explainability, and 3) short retrieval units. To address these issues, we propose GraPPI, a large-scale knowledge graph (KG)-based retrieve-divide-solve agent pipeline RAG framework to support large-scale PPI signaling pathway exploration in understanding therapeutic impacts by decomposing the analysis of entire PPI pathways into sub-tasks focused on the analysis of PPI edges.
Related papers
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z) - PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z) - KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction [60.23701115249195]
KEPLA is a novel deep learning framework that integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance.<n> Experiments on two benchmark datasets demonstrate that KEPLA consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-16T08:02:42Z) - RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug Discovery [12.637452293481681]
Retrieving the biological impacts of protein-protein interactions (PPIs) is essential for target identification (Target ID) in drug development.<n>No benchmark currently exists for identifying the biological impacts of PPIs.<n>We introduce the RAG Benchmark for PPIs (RAGPPI), a factual question-answer benchmark of 4,420 question-answer pairs.
arXiv Detail & Related papers (2025-05-28T05:48:25Z) - Hallucination Detection in LLMs via Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.
We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z) - Joint Masked Reconstruction and Contrastive Learning for Mining Interactions Between Proteins [4.254824555546419]
Protein-protein interaction (PPI) prediction is an instrumental means in elucidating the mechanisms underlying cellular operations.
This paper introduces a novel PPI prediction method jointing masked reconstruction and contrastive learning, termed JmcPPI.
Extensive experiments conducted on three widely utilized PPI datasets demonstrate that JmcPPI surpasses existing optimal baseline models.
arXiv Detail & Related papers (2025-03-06T17:39:12Z) - Protein Large Language Models: A Comprehensive Survey [71.65899614084853]
Protein-specific large language models (Protein LLMs) are revolutionizing protein science by enabling more efficient protein structure prediction, function annotation, and design.
This work provides the first comprehensive overview of Protein LLMs, covering their architectures, training datasets, evaluation metrics, and diverse applications.
arXiv Detail & Related papers (2025-02-21T19:22:10Z) - scGSDR: Harnessing Gene Semantics for Single-Cell Pharmacological Profiling [5.831554646284266]
scGSDR is a model that integrates two computational pipelines grounded in the knowledge of cellular states and gene signaling pathways.
scGSDR enhances predictive performance by incorporating gene semantics and employs an interpretability module.
The model's application has extended from single-drug predictions to scenarios involving drug combinations.
arXiv Detail & Related papers (2025-02-02T15:43:20Z) - DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction [8.98329812378801]
DrugAgent is a multi-agent system for drug-target interaction prediction.
It combines multiple specialized perspectives with transparent reasoning.
Our approach provides detailed, human-interpretable reasoning for each prediction.
arXiv Detail & Related papers (2024-08-23T21:24:59Z) - Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models [46.05020842978823]
Large Language Models (LLMs) have emerged as powerful tools to navigate this complex data landscape.
RAGGED is a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation.
arXiv Detail & Related papers (2024-07-17T07:44:18Z) - SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG)
Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries.
We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z) - PGraphDTA: Improving Drug Target Interaction Prediction using Protein
Language Models and Contact Maps [4.590060921188914]
Key aspect of drug discovery involves identifying novel drug-target (DT) interactions.
Protein-ligand interactions exhibit a continuum of binding strengths, known as binding affinity.
We propose novel enhancements to enhance their performance.
arXiv Detail & Related papers (2023-10-06T05:00:25Z) - SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity
Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery.
wet experiments remain the most reliable method, but they are time-consuming and resource-intensive.
Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue.
We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z) - AI-Bind: Improving Binding Predictions for Novel Protein Targets and
Ligands [9.135203550164833]
We show that state-of-the-art models fail to generalize to novel structures.
We introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training.
We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins.
arXiv Detail & Related papers (2021-12-25T01:52:58Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.