Related papers: Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA

Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA

URL: http://arxiv.org/abs/2504.21252v1
Date: Wed, 30 Apr 2025 01:37:44 GMT
Title: Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA
Authors: Xuanzhao Dong, Wenhui Zhu, Hao Wang, Xiwen Chen, Peijie Qiu, Rui Yin, Yi Su, Yalin Wang,
Abstract summary: We propose Discuss-RAG, a plug-and-play module designed to enhance the medical question answering system.<n>Our method introduces a summarizer agent that orchestrates a team of medical experts to emulate multi-turn brainstorming, thereby improving the relevance of retrieved content.<n> Experimental results on four benchmark medical QA datasets show that Discuss-RAG consistently outperforms MedRAG.
Score: 17.823588070044217
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Medical question answering (QA) is a reasoning-intensive task that remains challenging for large language models (LLMs) due to hallucinations and outdated domain knowledge. Retrieval-Augmented Generation (RAG) provides a promising post-training solution by leveraging external knowledge. However, existing medical RAG systems suffer from two key limitations: (1) a lack of modeling for human-like reasoning behaviors during information retrieval, and (2) reliance on suboptimal medical corpora, which often results in the retrieval of irrelevant or noisy snippets. To overcome these challenges, we propose Discuss-RAG, a plug-and-play module designed to enhance the medical QA RAG system through collaborative agent-based reasoning. Our method introduces a summarizer agent that orchestrates a team of medical experts to emulate multi-turn brainstorming, thereby improving the relevance of retrieved content. Additionally, a decision-making agent evaluates the retrieved snippets before their final integration. Experimental results on four benchmark medical QA datasets show that Discuss-RAG consistently outperforms MedRAG, especially significantly improving answer accuracy by up to 16.67% on BioASQ and 12.20% on PubMedQA. The code is available at: https://github.com/LLM-VLM-GSL/Discuss-RAG.

Related papers

MIRA: A Novel Framework for Fusing Modalities in Medical RAG [6.044279952668295]
We introduce the Multimodal Intelligent Retrieval and Augmentation (MIRA) framework, designed to optimize factual accuracy in MLLM.<n>MIRA consists of two key components: (1) a calibrated Rethinking and Rearrangement module that dynamically adjusts the number of retrieved contexts to manage factual risk, and (2) A medical RAG framework integrating image embeddings and a medical knowledge base with a query-rewrite module for efficient multimodal reasoning.
arXiv Detail & Related papers (2025-07-10T16:33:50Z)
MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z)
A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning [47.77948063906033]
MedRAG is a smart multimodal healthcare copilot equipped with powerful large language model (LLM) reasoning.<n>It supports multiple input modalities, including non-intrusive voice monitoring, general medical queries, and electronic health records.<n>MedRAG retrieves and integrates critical diagnostic insights, reducing the risk of misdiagnosis.
arXiv Detail & Related papers (2025-06-03T05:39:02Z)
Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge [6.977177904883792]
AMG-RAG is a framework that automates the construction and continuous updating of medical knowledge graphs.<n>It integrates reasoning, and retrieves current external evidence, such as PubMed and WikiSearch.<n>It achieves an F1 score of 74.1 percent on MEDQA and an accuracy of 66.34 percent on MEDMCQA, outperforming both comparable models and those 10 to 100 times larger.
arXiv Detail & Related papers (2025-02-18T16:29:45Z)
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot [47.77948063906033]
Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records.<n>This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain.<n>Tests show MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates.
arXiv Detail & Related papers (2025-02-06T12:27:35Z)
LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models [18.6994780408699]
Large Language Models (LLMs) face significant challenges in medical question answering. We propose a novel approach incorporating similar case generation within a multi-agent medical question-answering system. Our method capitalizes on the model's inherent medical knowledge and reasoning capabilities, eliminating the need for additional training data.
arXiv Detail & Related papers (2024-12-31T19:55:45Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
Rationale-Guided Retrieval Augmented Generation for Medical Question Answering [18.8818391508042]
Large language models (LLM) hold significant potential for applications in biomedicine. RAG$2$ is a new framework for enhancing the reliability of RAG in biomedical contexts.
arXiv Detail & Related papers (2024-11-01T01:40:23Z)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.<n>Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.<n>We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z)
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals. GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z)
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions [42.73799041840482]
i-MedRAG is a system that iteratively asks follow-up queries based on previous information-seeking attempts. Our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5. i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions.
arXiv Detail & Related papers (2024-08-01T17:18:17Z)
Benchmarking Retrieval-Augmented Generation for Medicine [30.390132015614128]
Large language models (LLMs) have achieved state-of-the-art performance on a wide range of medical question answering (QA) tasks. Retrieval-augmented generation (RAG) is a promising solution and has been widely adopted. We propose the Medical Information Retrieval-Augmented Generation Evaluation (MIRAGE), a first-of-its-kind benchmark including 7,663 questions from five medical QA datasets.
arXiv Detail & Related papers (2024-02-20T17:44:06Z)
Medical Question Understanding and Answering with Knowledge Grounding and Semantic Self-Supervision [53.692793122749414]
We introduce a medical question understanding and answering system with knowledge grounding and semantic self-supervision. Our system is a pipeline that first summarizes a long, medical, user-written question, using a supervised summarization loss. The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.
arXiv Detail & Related papers (2022-09-30T08:20:32Z)
MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG. We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation. Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.