MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
- URL: http://arxiv.org/abs/2510.14400v2
- Date: Sat, 18 Oct 2025 11:57:45 GMT
- Title: MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
- Authors: Yingpeng Ning, Yuanyuan Sun, Ling Luo, Yanhua Wang, Yuchen Pan, Hongfei Lin,
- Abstract summary: We propose MedTrust-Guided Iterative RAG, a framework designed to enhance factual consistency and hallucinations in medical QA.<n>First, it enforces citation-aware reasoning by requiring all generated content to be explicitly grounded in retrieved medical documents.<n>Second, it employs an iterative retrieval-verification process, where a verification agent assesses evidence adequacy.
- Score: 21.855579328680246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biomedical question answering (QA) requires accurate interpretation of complex medical knowledge. Large language models (LLMs) have shown promising capabilities in this domain, with retrieval-augmented generation (RAG) systems enhancing performance by incorporating external medical literature. However, RAG-based approaches in biomedical QA suffer from hallucinations due to post-retrieval noise and insufficient verification of retrieved evidence, undermining response reliability. We propose MedTrust-Guided Iterative RAG, a framework designed to enhance factual consistency and mitigate hallucinations in medical QA. Our method introduces three key innovations. First, it enforces citation-aware reasoning by requiring all generated content to be explicitly grounded in retrieved medical documents, with structured Negative Knowledge Assertions used when evidence is insufficient. Second, it employs an iterative retrieval-verification process, where a verification agent assesses evidence adequacy and refines queries through Medical Gap Analysis until reliable information is obtained. Third, it integrates the MedTrust-Align Module (MTAM) that combines verified positive examples with hallucination-aware negative samples, leveraging Direct Preference Optimization to reinforce citation-grounded reasoning while penalizing hallucination-prone response patterns.
Related papers
- Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation [97.36081721024728]
We propose the first benchmark for assessing confidence in multi-turn interaction during realistic medical consultations.<n>Our benchmark unifies three types of medical data for open-ended diagnostic generation.<n>We present MedConf, an evidence-grounded linguistic self-assessment framework.
arXiv Detail & Related papers (2026-01-22T04:51:39Z) - MedRAGChecker: Claim-Level Verification for Biomedical Retrieval-Augmented Generation [8.37586466142299]
We introduce MedRAGChecker, a claim-level verification and diagnostic framework for biomedical RAG.<n>Given a question, retrieved evidence, and a generated answer, MedRAGChecker decomposes the answer into atomic claims and estimates claim support.<n>We show that MedRAGChecker reliably flags unsupported and contradicted claims and reveals distinct risk profiles across generators.
arXiv Detail & Related papers (2026-01-10T10:40:42Z) - Self-MedRAG: a Self-Reflective Hybrid Retrieval-Augmented Generation Framework for Reliable Medical Question Answering [39.146761527401424]
Self-MedRAG is a self-reflective hybrid framework designed to mimic the iterative hypothesis-verification process of clinical reasoning.<n>It integrates a hybrid retrieval strategy, combining sparse (BM25) and dense (Contriever) retrievers via Reciprocal Rank Fusion.<n>It employs a generator to produce answers with supporting rationales, which are then assessed by a lightweight self-reflection module.
arXiv Detail & Related papers (2026-01-08T02:56:04Z) - When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare [0.05249805590164902]
This work investigates the performance of five large language models (LLMs) in generating responses to medicine-related queries.<n>Our findings show that contradictions between highly similar abstracts do, in fact, degrade performance, leading to inconsistencies and reduced factual accuracy in model answers.
arXiv Detail & Related papers (2025-11-10T03:27:54Z) - From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering [26.91142737000078]
Existing approaches typically fall into two categories: Retrieval-Augmented Generation (RAG) and Generation-Augmented Generation (GAG)<n>We propose MedRGAG, a unified retrieval-generation augmented framework that seamlessly integrates external and parametric knowledge for medical QA.
arXiv Detail & Related papers (2025-10-21T04:58:29Z) - VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification [107.75781898355562]
We introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution.<n>We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
arXiv Detail & Related papers (2025-10-13T13:38:54Z) - Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain [8.094811345546118]
Retrieval augmented generation (RAG) systems provide a method for factually grounding the responses of a Large Language Model (LLM) by providing retrieved evidence, or context, as support.<n>This design introduces a critical vulnerability: LLMs may absorb and reproduce misinformation present in retrieved evidence.<n>This problem is magnified if retrieved evidence contains adversarial material explicitly intended to promulgate misinformation.
arXiv Detail & Related papers (2025-09-04T00:45:58Z) - CaresAI at BioCreative IX Track 1 -- LLM for Biomedical QA [3.222047196930981]
Large language models (LLMs) are increasingly evident for accurate question answering across various domains.<n>This paper presents our approach to the MedHopQA track of the BioCreative IX shared task.<n>Three experimental setups are explored: fine-tuning on combined short and long answers, short answers only, and long answers only.
arXiv Detail & Related papers (2025-08-31T11:40:02Z) - Knowing or Guessing? Robust Medical Visual Question Answering via Joint Consistency and Contrastive Learning [34.6490677122246]
We show that current Medical Vision-Language Models (Med-VLMs) exhibit concerning fragility in Medical Visual Question Answering.<n>We propose Consistency and Contrastive Learning (CCL), which integrates knowledge-anchored consistency learning and bias-aware contrastive learning.<n>CCL achieves SOTA performance on three popular VQA benchmarks and notably improves answer consistency by 50% on the challenging RoMed test set.
arXiv Detail & Related papers (2025-08-26T05:21:19Z) - MedCoT-RAG: Causal Chain-of-Thought RAG for Medical Question Answering [4.285647375182588]
Large language models (LLMs) have shown promise in medical question answering but often struggle with hallucinations and shallow reasoning.<n>Retrieval-augmented generation (RAG) offers a practical and privacy-preserving way to enhance LLMs with external medical knowledge.<n>We introduce MedCoT-RAG, a domain-specific framework that combines causal-aware document retrieval with structured chain-of-thought prompting.
arXiv Detail & Related papers (2025-08-20T05:43:26Z) - MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph [57.54231831309079]
We introduce MedKGent, a framework for constructing temporally evolving medical Knowledge Graphs.<n>We simulate the emergence of biomedical knowledge via a fine-grained daily time series.<n>The resulting KG contains 156,275 entities and 2,971,384 relational triples.
arXiv Detail & Related papers (2025-08-17T15:14:03Z) - MIRA: A Novel Framework for Fusing Modalities in Medical RAG [6.044279952668295]
We introduce the Multimodal Intelligent Retrieval and Augmentation (MIRA) framework, designed to optimize factual accuracy in MLLM.<n>MIRA consists of two key components: (1) a calibrated Rethinking and Rearrangement module that dynamically adjusts the number of retrieved contexts to manage factual risk, and (2) A medical RAG framework integrating image embeddings and a medical knowledge base with a query-rewrite module for efficient multimodal reasoning.
arXiv Detail & Related papers (2025-07-10T16:33:50Z) - Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation [108.13261761812517]
We introduce FRANQ (Faithfulness-based Retrieval Augmented UNcertainty Quantification), a novel method for hallucination detection in RAG outputs.<n>We present a new long-form Question Answering (QA) dataset annotated for both factuality and faithfulness.
arXiv Detail & Related papers (2025-05-27T11:56:59Z) - Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA [17.823588070044217]
We propose Discuss-RAG, a plug-and-play module designed to enhance the medical question answering system.<n>Our method introduces a summarizer agent that orchestrates a team of medical experts to emulate multi-turn brainstorming, thereby improving the relevance of retrieved content.<n> Experimental results on four benchmark medical QA datasets show that Discuss-RAG consistently outperforms MedRAG.
arXiv Detail & Related papers (2025-04-30T01:37:44Z) - Retrieval-Augmented Generation with Conflicting Evidence [57.66282463340297]
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses.<n>In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources.<n>We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
arXiv Detail & Related papers (2025-04-17T16:46:11Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness [47.51360338851017]
Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence.
The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information.
Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task.
arXiv Detail & Related papers (2023-11-19T03:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.