Related papers: The Multi-Round Diagnostic RAG Framework for Emulating Clinical Reasoning

The Multi-Round Diagnostic RAG Framework for Emulating Clinical Reasoning

URL: http://arxiv.org/abs/2504.07724v2
Date: Tue, 05 Aug 2025 05:27:55 GMT
Title: The Multi-Round Diagnostic RAG Framework for Emulating Clinical Reasoning
Authors: Penglei Sun, Yixiang Chen, Xiang Li, Xiaowen Chu,
Abstract summary: We construct DiagnosGraph, a knowledge graph covering both modern medicine and Traditional Chinese Medicine.<n>To bridge the gap between colloquial patient narratives and academic medical knowledge, DiagnosGraph also introduces $1,908$ medical record.<n>Experiments conducted on four medical benchmarks, with evaluations by human physicians, demonstrate that MRD-RAG enhances the diagnostic performance of LLMs.
Score: 10.483453944197407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, accurately and quickly deploying medical large language models (LLMs) has become a trend. Among these, retrieval-augmented generation (RAG) has garnered attention due to rapid deployment and privacy protection. However, the challenge hinder the practical deployment of RAG for medical diagnosis: the semantic gap between colloquial patient descriptions and the professional terminology within medical knowledge bases. We try to address the challenge from the data perspective and the method perspective. First, to address the semantic gap in existing knowledge bases, we construct DiagnosGraph, a generalist knowledge graph covering both modern medicine and Traditional Chinese Medicine. It contains 876 common diseases with the graph of 7,997 nodes and 37,201 triples. To bridge the gap between colloquial patient narratives and academic medical knowledge, DiagnosGraph also introduces $1,908$ medical record by formalizing the patient chief complaint and proposing a medical diagnosis. Second, we introduce the Multi-Round Diagnostic RAG (MRD-RAG) framework. It utilizes a multi-round dialogue to refine diagnostic possibilities, emulating the clinical reasoning of a physician. Experiments conducted on four medical benchmarks, with evaluations by human physicians, demonstrate that MRD-RAG enhances the diagnostic performance of LLMs, highlighting its potential to make automated diagnosis more accurate and human-aligned.

Related papers

Sequential Diagnosis with Language Models [21.22416732642907]
We introduce the Sequential Diagnosis Benchmark, which transforms 304 diagnostically challenging cases into stepwise diagnostic encounters.<n>Performance is assessed not just by diagnostic accuracy but also by the cost of physician visits and tests performed.<n>We also present the MAI Diagnostic Orchestrator (MAI-DxO), a model-agnostic orchestrator that simulates a panel of physicians.
arXiv Detail & Related papers (2025-06-27T17:27:26Z)
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [58.78045864541539]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z)
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models [25.13622249539088]
DiagnosisArena is a benchmark designed to rigorously assess professional-level diagnostic competence.<n> DiagnosisArena consists of 1,113 pairs of segmented patient cases and corresponding diagnoses, spanning 28 medical specialties.<n>Our study reveals that even the most advanced reasoning models, o3, o1, and DeepSeek-R1, achieve only 51.12%, 31.09%, and 17.79% accuracy, respectively.
arXiv Detail & Related papers (2025-05-20T09:14:53Z)
MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow [16.089816031251335]
Multi-modal Large Language Models (MLLMs) have gained significant attention and achieved success across various domains.<n>They lack detailed perception of visual inputs, limiting their ability to perform quantitative image analysis.<n>We propose MedAgent-Pro, an evidence-based reasoning agentic system designed to achieve reliable, explainable, and precise medical diagnoses.
arXiv Detail & Related papers (2025-03-21T14:04:18Z)
MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation [20.622990699649694]
Multi-role collaboration in MDT consultations often results in excessively long dialogue histories. We propose a multi-agent MDT medical consultation framework based on Large Language Models (LLMs) to address these issues. Our framework uses consensus aggregation and a residual discussion structure for multi-round consultations. It also employs a Correct Answer Knowledge Base (CorrectKB) and a Chain-of-Thought Knowledge Base (ChainKB) to accumulate consultation experience.
arXiv Detail & Related papers (2025-03-18T03:07:34Z)
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot [47.77948063906033]
Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records.<n>This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain.<n>Tests show MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates.
arXiv Detail & Related papers (2025-02-06T12:27:35Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.<n>Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.<n>We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z)
MAGDA: Multi-agent guideline-driven diagnostic assistance [43.15066219293877]
In emergency departments, rural hospitals, or clinics in less developed regions, clinicians often lack fast image analysis by trained radiologists. In this work, we introduce a new approach for zero-shot guideline-driven decision support. We model a system of multiple LLM agents augmented with a contrastive vision-language model that collaborate to reach a patient diagnosis.
arXiv Detail & Related papers (2024-09-10T09:10:30Z)
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians. Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models [57.88111980149541]
We introduce Asclepius, a novel Med-MLLM benchmark that assesses Med-MLLMs in terms of distinct medical specialties and different diagnostic capacities.<n>Grounded in 3 proposed core principles, Asclepius ensures a comprehensive evaluation by encompassing 15 medical specialties.<n>We also provide an in-depth analysis of 6 Med-MLLMs and compare them with 3 human specialists.
arXiv Detail & Related papers (2024-02-17T08:04:23Z)
Medical Dialogue Generation via Intuitive-then-Analytical Differential Diagnosis [14.17497921394565]
We propose a medical dialogue generation framework with the Intuitive-then-Analytic Differential Diagnosis (IADDx) Our method starts with a differential diagnosis via retrieval-based intuitive association and subsequently refines it through a graph-enhanced analytic procedure. Experimental results on two datasets validate the efficacy of our method.
arXiv Detail & Related papers (2024-01-12T12:35:19Z)
Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study [6.10474409373543]
We propose an innovative approach for augmenting the proficiency of Large Language Models (LLMs) in automated diagnosis generation.<n>We derive the KG from the National Library of Medicine's Unified Medical Language System (UMLS), a robust repository of biomedical knowledge.<n>Our approach offers an explainable diagnostic pathway, edging us closer to the realization of AI-augmented diagnostic decision support systems.
arXiv Detail & Related papers (2023-08-28T06:05:18Z)
BMAD: Benchmarks for Medical Anomaly Detection [51.22159321912891]
Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. We introduce a comprehensive evaluation benchmark for assessing anomaly detection methods on medical images.
arXiv Detail & Related papers (2023-06-20T20:23:46Z)
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs [48.11532667875847]
ChatCAD+ is a tool to generate high-quality medical reports and provide reliable medical advice. The Reliable Report Generation module is capable of interpreting medical images and generate high-quality medical reports. The Reliable Interaction module leverages up-to-date information from reputable medical websites to provide reliable medical advice.
arXiv Detail & Related papers (2023-05-25T12:03:31Z)
Inheritance-guided Hierarchical Assignment for Clinical Automatic Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.