MedRAGChecker: Claim-Level Verification for Biomedical Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2601.06519v1
- Date: Sat, 10 Jan 2026 10:40:42 GMT
- Title: MedRAGChecker: Claim-Level Verification for Biomedical Retrieval-Augmented Generation
- Authors: Yuelyu Ji, Min Gu Kwak, Hang Zhang, Xizhi Wu, Chenyu Li, Yanshan Wang,
- Abstract summary: We introduce MedRAGChecker, a claim-level verification and diagnostic framework for biomedical RAG.<n>Given a question, retrieved evidence, and a generated answer, MedRAGChecker decomposes the answer into atomic claims and estimates claim support.<n>We show that MedRAGChecker reliably flags unsupported and contradicted claims and reveals distinct risk profiles across generators.
- Score: 8.37586466142299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical retrieval-augmented generation (RAG) can ground LLM answers in medical literature, yet long-form outputs often contain isolated unsupported or contradictory claims with safety implications. We introduce MedRAGChecker, a claim-level verification and diagnostic framework for biomedical RAG. Given a question, retrieved evidence, and a generated answer, MedRAGChecker decomposes the answer into atomic claims and estimates claim support by combining evidence-grounded natural language inference (NLI) with biomedical knowledge-graph (KG) consistency signals. Aggregating claim decisions yields answer-level diagnostics that help disentangle retrieval and generation failures, including faithfulness, under-evidence, contradiction, and safety-critical error rates. To enable scalable evaluation, we distill the pipeline into compact biomedical models and use an ensemble verifier with class-specific reliability weighting. Experiments on four biomedical QA benchmarks show that MedRAGChecker reliably flags unsupported and contradicted claims and reveals distinct risk profiles across generators, particularly on safety-critical biomedical relations.
Related papers
- Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation [97.36081721024728]
We propose the first benchmark for assessing confidence in multi-turn interaction during realistic medical consultations.<n>Our benchmark unifies three types of medical data for open-ended diagnostic generation.<n>We present MedConf, an evidence-grounded linguistic self-assessment framework.
arXiv Detail & Related papers (2026-01-22T04:51:39Z) - Self-MedRAG: a Self-Reflective Hybrid Retrieval-Augmented Generation Framework for Reliable Medical Question Answering [39.146761527401424]
Self-MedRAG is a self-reflective hybrid framework designed to mimic the iterative hypothesis-verification process of clinical reasoning.<n>It integrates a hybrid retrieval strategy, combining sparse (BM25) and dense (Contriever) retrievers via Reciprocal Rank Fusion.<n>It employs a generator to produce answers with supporting rationales, which are then assessed by a lightweight self-reflection module.
arXiv Detail & Related papers (2026-01-08T02:56:04Z) - MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering [21.855579328680246]
We propose MedTrust-Guided Iterative RAG, a framework designed to enhance factual consistency and hallucinations in medical QA.<n>First, it enforces citation-aware reasoning by requiring all generated content to be explicitly grounded in retrieved medical documents.<n>Second, it employs an iterative retrieval-verification process, where a verification agent assesses evidence adequacy.
arXiv Detail & Related papers (2025-10-16T07:59:11Z) - Combating Biomedical Misinformation through Multi-modal Claim Detection and Evidence-based Verification [11.555285143713315]
CER (Combining Evidence and Reasoning) is a novel framework for biomedical fact-checking.<n>It integrates scientific evidence retrieval, reasoning via large language models, and supervised veracity prediction.<n>It effectively mitigates the risk of hallucinations, ensuring that generated outputs are grounded in verifiable, evidence-based sources.
arXiv Detail & Related papers (2025-09-17T10:31:09Z) - Combining Evidence and Reasoning for Biomedical Fact-Checking [11.555285143713315]
CER (Combin- ing Evidence and Reasoning) is a novel framework for biomedical fact-checking.<n>It integrates scientific evidence retrieval, reasoning via large language models, and supervised veracity prediction.
arXiv Detail & Related papers (2025-09-17T10:14:56Z) - HeteroRAG: A Heterogeneous Retrieval-Augmented Generation Framework for Medical Vision Language Tasks [22.597677744620295]
We present HeteroRAG, a novel framework that enhances Med-LVLMs through heterogeneous knowledge sources.<n>HeteroRAG achieves state-of-the-art performance in most medical vision language benchmarks.
arXiv Detail & Related papers (2025-08-18T09:54:10Z) - MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph [57.54231831309079]
We introduce MedKGent, a framework for constructing temporally evolving medical Knowledge Graphs.<n>We simulate the emergence of biomedical knowledge via a fine-grained daily time series.<n>The resulting KG contains 156,275 entities and 2,971,384 relational triples.
arXiv Detail & Related papers (2025-08-17T15:14:03Z) - Fact or Guesswork? Evaluating Large Language Models' Medical Knowledge with Structured One-Hop Judgments [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their abilities to directly recall and apply factual medical knowledge remains under-explored.<n>We introduce the Medical Knowledge Judgment dataset (MKJ), a dataset derived from the Unified Medical Language System (UMLS), a comprehensive repository of standardized vocabularies and knowledge graphs.<n>Through a binary classification framework, MKJ evaluates LLMs' grasp of fundamental medical facts by having them assess the validity of concise, one-hop statements.
arXiv Detail & Related papers (2025-02-20T05:27:51Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.<n>Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.<n>We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.