RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
- URL: http://arxiv.org/abs/2509.19980v1
- Date: Wed, 24 Sep 2025 10:36:14 GMT
- Title: RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
- Authors: Haolin Li, Tianjie Dai, Zhe Chen, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang,
- Abstract summary: Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
- Score: 56.373297358647655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical diagnosis is a highly specialized discipline requiring both domain expertise and strict adherence to rigorous guidelines. While current AI-driven medical research predominantly focuses on knowledge graphs or natural text pretraining paradigms to incorporate medical knowledge, these approaches primarily rely on implicitly encoded knowledge within model parameters, neglecting task-specific knowledge required by diverse downstream tasks. To address this limitation, we propose Retrieval-Augmented Diagnosis (RAD), a novel framework that explicitly injects external knowledge into multimodal models directly on downstream tasks. Specifically, RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss that constrains the latent distance between multi-modal features and guideline knowledge, and the dual transformer decoder that employs guidelines as queries to steer cross-modal fusion, aligning the models with clinical diagnostic workflows from guideline acquisition to feature extraction and decision-making. Moreover, recognizing the lack of quantitative evaluation of interpretability for multimodal diagnostic models, we introduce a set of criteria to assess the interpretability from both image and text perspectives. Extensive evaluations across four datasets with different anatomies demonstrate RAD's generalizability, achieving state-of-the-art performance. Furthermore, RAD enables the model to concentrate more precisely on abnormal regions and critical indicators, ensuring evidence-based, trustworthy diagnosis. Our code is available at https://github.com/tdlhl/RAD.
Related papers
- MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z) - Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z) - A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making [49.048767633316764]
KAMAC is a knowledge-driven Adaptive Multi-Agent Collaboration framework.<n>It enables agents to dynamically form and expand expert teams based on the evolving diagnostic context.<n> Experiments on two real-world medical benchmarks demonstrate that KAMAC significantly outperforms both single-agent and advanced multi-agent methods.
arXiv Detail & Related papers (2025-09-18T14:33:36Z) - Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI [20.623198882452986]
We have developed a strategy to leverage radiologists' domain knowledge to automate the diagnosis of carotid plaque vulnerability.<n>This method excels in harnessing cross-modality prior knowledge from limited image annotations and radiology reports within training data.
arXiv Detail & Related papers (2025-09-15T13:38:35Z) - Expert-Guided Explainable Few-Shot Learning for Medical Image Diagnosis [2.7946918847372277]
We propose an expert-guided explainable few-shot learning framework that integrates radiologist-provided regions of interest into model training.<n>We evaluate our framework on two distinct datasets: BraTS (MRI) and VinDr-CXR (Chest X-ray)<n>Our findings demonstrate the effectiveness of incorporating expert-guided attention supervision to bridge the gap between performance and interpretability in few-shot medical image diagnosis.
arXiv Detail & Related papers (2025-09-08T05:31:37Z) - End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning [52.12425911708585]
Deep-DxSearch is an agentic RAG system trained end-to-end with reinforcement learning (RL)<n>In Deep-DxSearch, we first construct a large-scale medical retrieval corpus comprising patient records and reliable medical knowledge sources.<n> Experiments demonstrate that our end-to-end RL training framework consistently outperforms prompt-engineering and training-free RAG approaches.
arXiv Detail & Related papers (2025-08-21T17:42:47Z) - MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method.<n>It is designed to enhance interpretability and accuracy in biomedical imaging inquiries.<n> Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z) - MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement [1.6355783973385114]
Multi-view perception knowledge-enhanced TansfoRmer (MvKeTR)<n>MVPA with view-aware attention is proposed to synthesize diagnostic information from multiple anatomical views effectively.<n>Cross-Modal Knowledge Enhancer (CMKE) is devised to retrieve the most similar reports based on the query volume.
arXiv Detail & Related papers (2024-11-27T12:58:23Z) - Leveraging Expert Input for Robust and Explainable AI-Assisted Lung Cancer Detection in Chest X-rays [2.380494879018844]
This study examines the interpretability and robustness of a high-performing lung cancer detection model based on InceptionV3.<n>We develop ClinicXAI, an expert-driven approach leveraging the concept bottleneck methodology.
arXiv Detail & Related papers (2024-03-28T14:15:13Z) - Knowledge-enhanced Visual-Language Pre-training on Chest Radiology
Images [40.52487429030841]
We propose Knowledge-enhanced Auto Diagnosis (KAD) to guide vision-supervised pre-training using paired chest X-rays and radiology reports.
We evaluate KAD on four external X-ray datasets and demonstrate that its zero-shot performance is superior to that of fully-language models.
arXiv Detail & Related papers (2023-02-27T18:53:10Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.