MedKGI: Iterative Differential Diagnosis with Medical Knowledge Graphs and Information-Guided Inquiring
- URL: http://arxiv.org/abs/2512.24181v2
- Date: Sun, 04 Jan 2026 11:47:36 GMT
- Title: MedKGI: Iterative Differential Diagnosis with Medical Knowledge Graphs and Information-Guided Inquiring
- Authors: Qipeng Wang, Rui Sheng, Yafei Li, Huamin Qu, Yushi Sun, Min Zhu,
- Abstract summary: We propose MedKGI, a diagnostic framework grounded in clinical practices.<n>We show that MedKGI outperforms strong LLM baselines in both diagnostic accuracy and inquiry efficiency.
- Score: 39.776843923694244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated significant promise in clinical diagnosis. However, current models struggle to emulate the iterative, diagnostic hypothesis-driven reasoning of real clinical scenarios. Specifically, current LLMs suffer from three critical limitations: (1) generating hallucinated medical content due to weak grounding in verified knowledge, (2) asking redundant or inefficient questions rather than discriminative ones that hinder diagnostic progress, and (3) losing coherence over multi-turn dialogues, leading to contradictory or inconsistent conclusions. To address these challenges, we propose MedKGI, a diagnostic framework grounded in clinical practices. MedKGI integrates a medical knowledge graph (KG) to constrain reasoning to validated medical ontologies, selects questions based on information gain to maximize diagnostic efficiency, and adopts an OSCE-format structured state to maintain consistent evidence tracking across turns. Experiments on clinical benchmarks show that MedKGI outperforms strong LLM baselines in both diagnostic accuracy and inquiry efficiency, improving dialogue efficiency by 30% on average while maintaining state-of-the-art accuracy.
Related papers
- MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions [26.936554184582096]
We introduce MedClarify, an AI agent for information-seeking that can generate follow-up questions for iterative reasoning.<n>Specifically, MedClarify computes a list of candidate diagnoses analogous to a differential diagnosis, and then proactively generates follow-up questions.
arXiv Detail & Related papers (2026-02-19T12:19:12Z) - M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding [66.78251988482222]
Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models by encouraging step-by-step intermediate reasoning.<n>Current benchmarks for medical image understanding generally focus on the final answer while ignoring the reasoning path.<n>M3CoTBench aims to foster the development of transparent, trustworthy, and diagnostically accurate AI systems for healthcare.
arXiv Detail & Related papers (2026-01-13T17:42:27Z) - MedEinst: Benchmarking the Einstellung Effect in Medical LLMs through Counterfactual Differential Diagnosis [13.241795322837861]
We introduce MedEinst, a counterfactual benchmark with 5,383 paired clinical cases across 49 diseases.<n>We measure susceptibility via Bias Trap Rate--probability of misdiagnosing traps despite correctly diagnosing controls.
arXiv Detail & Related papers (2026-01-10T17:39:25Z) - ClinDEF: A Dynamic Evaluation Framework for Large Language Models in Clinical Reasoning [58.01333341218153]
We propose ClinDEF, a dynamic framework for assessing clinical reasoning in LLMs through simulated diagnostic dialogues.<n>Our method generates patient cases and facilitates multi-turn interactions between an LLM-based doctor and an automated patient agent.<n>Experiments show that ClinDEF effectively exposes critical clinical reasoning gaps in state-of-the-art LLMs.
arXiv Detail & Related papers (2025-12-29T12:58:58Z) - Timely Clinical Diagnosis through Active Test Selection [49.091903570068155]
We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
arXiv Detail & Related papers (2025-10-21T18:10:45Z) - Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - Integrating clinical reasoning into large language model-based diagnosis through etiology-aware attention steering [7.092919468004549]
Large Language Models (LLMs) demonstrate significant capabilities in medical text understanding and generation.<n>This study aims to enhance LLMs' diagnostic accuracy and clinical reasoning ability.
arXiv Detail & Related papers (2025-08-01T03:05:43Z) - DocCHA: Towards LLM-Augmented Interactive Online diagnosis System [17.975659876934895]
DocCHA is a confidence-aware, modular framework that emulates clinical reasoning by decomposing the diagnostic process into three stages.<n> evaluated on two real-world Chinese consultation datasets.
arXiv Detail & Related papers (2025-07-10T15:52:04Z) - The Multi-Round Diagnostic RAG Framework for Emulating Clinical Reasoning [10.483453944197407]
We construct DiagnosGraph, a knowledge graph covering both modern medicine and Traditional Chinese Medicine.<n>To bridge the gap between colloquial patient narratives and academic medical knowledge, DiagnosGraph also introduces $1,908$ medical record.<n>Experiments conducted on four medical benchmarks, with evaluations by human physicians, demonstrate that MRD-RAG enhances the diagnostic performance of LLMs.
arXiv Detail & Related papers (2025-04-10T13:17:51Z) - Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning [17.647875658030006]
This study investigates the underlying mechanisms behind the performance degradation phenomenon.<n>We developed a plug-and-play method enhanced (PPME) LLM agent, leveraging over 3.5 million electronic medical records from Chinese and American healthcare facilities.<n>Our approach integrates specialized models for initial disease diagnosis and inquiry into the history of the present illness, trained through supervised and reinforcement learning techniques.
arXiv Detail & Related papers (2025-02-24T06:24:20Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.