KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
- URL: http://arxiv.org/abs/2410.04660v2
- Date: Mon, 03 Mar 2025 18:23:47 GMT
- Title: KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
- Authors: Xiaorui Su, Yibo Wang, Shanghua Gao, Xiaolong Liu, Valentina Giunchiglia, Djork-Arné Clevert, Marinka Zitnik,
- Abstract summary: KGARevion is a knowledge graph-based agent that answers knowledge-intensive questions.<n>It generates relevant triplets by leveraging the latent knowledge embedded in a large language model.<n>It then verifies these triplets against a grounded knowledge graph, filtering out errors and retaining only accurate, contextually relevant information.
- Score: 31.080514888803886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical reasoning integrates structured, codified knowledge with tacit, experience-driven insights. Depending on the context, quantity, and nature of available evidence, researchers and clinicians use diverse strategies, including rule-based, prototype-based, and case-based reasoning. Effective medical AI models must handle this complexity while ensuring reliability and adaptability. We introduce KGARevion, a knowledge graph-based agent that answers knowledge-intensive questions. Upon receiving a query, KGARevion generates relevant triplets by leveraging the latent knowledge embedded in a large language model. It then verifies these triplets against a grounded knowledge graph, filtering out errors and retaining only accurate, contextually relevant information for the final answer. This multi-step process strengthens reasoning, adapts to different models of medical inference, and outperforms retrieval-augmented generation-based approaches that lack effective verification mechanisms. Evaluations on medical QA benchmarks show that KGARevion improves accuracy by over 5.2% over 15 models in handling complex medical queries. To further assess its effectiveness, we curated three new medical QA datasets with varying levels of semantic complexity, where KGARevion improved accuracy by 10.4%. The agent integrates with different LLMs and biomedical knowledge graphs for broad applicability across knowledge-intensive tasks. We evaluated KGARevion on AfriMed-QA, a newly introduced dataset focused on African healthcare, demonstrating its strong zero-shot generalization to underrepresented medical contexts.
Related papers
- Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.
We propose a novel approach utilizing structured medical reasoning.
Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Uncertainty-aware abstention in medical diagnosis based on medical texts [87.88110503208016]
This study addresses the critical issue of reliability for AI-assisted medical diagnosis.
We focus on the selection prediction approach that allows the diagnosis system to abstain from providing the decision if it is not confident in the diagnosis.
We introduce HUQ-2, a new state-of-the-art method for enhancing reliability in selective prediction tasks.
arXiv Detail & Related papers (2025-02-25T10:15:21Z) - Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge [6.977177904883792]
AMG-RAG is a comprehensive framework that automates the construction and continuous updating of medical knowledge graphs.
It integrates reasoning, and retrieves current external evidence, such as PubMed and WikiSearch.
It achieves an F1 score of 74.1 percent on MEDQA and an accuracy of 66.34 percent on MEDMCQA, outperforming both comparable models and those 10 to 100 times larger.
arXiv Detail & Related papers (2025-02-18T16:29:45Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.
Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.
We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.
Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? [33.70022886795487]
OpenAI's o1 stands out as the first model with a chain-of-thought technique using reinforcement learning strategies.
This report provides a comprehensive exploration of o1 on different medical scenarios, examining 3 key aspects: understanding, reasoning, and multilinguality.
arXiv Detail & Related papers (2024-09-23T17:59:43Z) - ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering [54.80411755871931]
Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth.
Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format.
This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful.
We introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data.
arXiv Detail & Related papers (2024-07-24T01:46:55Z) - emrQA-msquad: A Medical Dataset Structured with the SQuAD V2.0 Framework, Enriched with emrQA Medical Information [2.2083091880368855]
The emrQA-msquad dataset was developed to address the intricacies of medical terminology.
A dedicated medical dataset for the Span extraction task was introduced, reinforcing the system's robustness.
The fine-tuning of models such as BERT, RoBERTa, and Tiny RoBERTa significantly improved response accuracy within the F1-score range of 0.75 to 1.00.
arXiv Detail & Related papers (2024-04-18T10:06:00Z) - XAIQA: Explainer-Based Data Augmentation for Extractive Question
Answering [1.1867812760085572]
We introduce a novel approach, XAIQA, for generating synthetic QA pairs at scale from data naturally available in electronic health records.
Our method uses the idea of a classification model explainer to generate questions and answers about medical concepts corresponding to medical codes.
arXiv Detail & Related papers (2023-12-06T15:59:06Z) - MKA: A Scalable Medical Knowledge Assisted Mechanism for Generative
Models on Medical Conversation Tasks [3.9571320117430866]
The mechanism aims to assist general neural generative models to achieve better performance on the medical conversation task.
The medical-specific knowledge graph is designed within the mechanism, which contains 6 types of medical-related information.
The evaluation results demonstrate that models combined with our mechanism outperform original methods in multiple automatic evaluation metrics.
arXiv Detail & Related papers (2023-12-05T04:55:54Z) - Generating Explanations in Medical Question-Answering by Expectation
Maximization Inference over Evidence [33.018873142559286]
We propose a novel approach for generating natural language explanations for answers predicted by medical QA systems.
Our system extract knowledge from medical textbooks to enhance the quality of explanations during the explanation generation process.
arXiv Detail & Related papers (2023-10-02T16:00:37Z) - Knowledge-injected Prompt Learning for Chinese Biomedical Entity
Normalization [6.927883826415262]
We propose a novel Knowledge-injected Prompt Learning (PL-Knowledge) method to tackle the Biomedical Entity Normalization (BEN) task.
Specifically, our approach consists of five stages: candidate entity matching, knowledge extraction, knowledge encoding, knowledge injection, and prediction output.
By effectively encoding the knowledge items contained in medical entities, the additional knowledge enhances the model's ability to capture latent relationships between medical entities.
arXiv Detail & Related papers (2023-08-23T09:32:40Z) - A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises [52.31710895034573]
This work provides the first comprehensive review of healthcare knowledge graphs (HKGs)
It summarizes the pipeline and key techniques for HKG construction, as well as the common utilization approaches.
At the application level, we delve into the successful integration of HKGs across various health domains.
arXiv Detail & Related papers (2023-06-07T21:51:56Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z) - BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph
Construction and Analysis [0.4893345190925178]
We propose an end-to-end approach for knowledge extraction and analysis from biomedical clinical notes.
The proposed framework can successfully extract relevant structured information with high accuracy.
arXiv Detail & Related papers (2023-04-21T14:45:33Z) - HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented
Prompting [33.1455954220194]
HiPrompt is a supervision-efficient knowledge fusion framework.
It elicits the few-shot reasoning ability of large language models through hierarchy-oriented prompts.
Empirical results on the collected KG-Hi-BKF benchmark datasets demonstrate the effectiveness of HiPrompt.
arXiv Detail & Related papers (2023-04-12T16:54:26Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - SSD-KD: A Self-supervised Diverse Knowledge Distillation Method for
Lightweight Skin Lesion Classification Using Dermoscopic Images [62.60956024215873]
Skin cancer is one of the most common types of malignancy, affecting a large population and causing a heavy economic burden worldwide.
Most studies in skin cancer detection keep pursuing high prediction accuracies without considering the limitation of computing resources on portable devices.
This study specifically proposes a novel method, termed SSD-KD, that unifies diverse knowledge into a generic KD framework for skin diseases classification.
arXiv Detail & Related papers (2022-03-22T06:54:29Z) - DIVERSE: bayesian Data IntegratiVE learning for precise drug ResponSE
prediction [27.531532648298768]
DIVERSE is a framework to predict drug responses from data of cell lines, drugs, and gene interactions.
It integrates data sources systematically, in a step-wise manner, examining the importance of each added data set in turn.
It clearly outperformed five other methods including three state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T12:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.