Related papers: RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis

RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis

URL: http://arxiv.org/abs/2509.19980v1
Date: Wed, 24 Sep 2025 10:36:14 GMT
Title: RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
Authors: Haolin Li, Tianjie Dai, Zhe Chen, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang,
Abstract summary: Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
Score: 56.373297358647655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical diagnosis is a highly specialized discipline requiring both domain expertise and strict adherence to rigorous guidelines. While current AI-driven medical research predominantly focuses on knowledge graphs or natural text pretraining paradigms to incorporate medical knowledge, these approaches primarily rely on implicitly encoded knowledge within model parameters, neglecting task-specific knowledge required by diverse downstream tasks. To address this limitation, we propose Retrieval-Augmented Diagnosis (RAD), a novel framework that explicitly injects external knowledge into multimodal models directly on downstream tasks. Specifically, RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss that constrains the latent distance between multi-modal features and guideline knowledge, and the dual transformer decoder that employs guidelines as queries to steer cross-modal fusion, aligning the models with clinical diagnostic workflows from guideline acquisition to feature extraction and decision-making. Moreover, recognizing the lack of quantitative evaluation of interpretability for multimodal diagnostic models, we introduce a set of criteria to assess the interpretability from both image and text perspectives. Extensive evaluations across four datasets with different anatomies demonstrate RAD's generalizability, achieving state-of-the-art performance. Furthermore, RAD enables the model to concentrate more precisely on abnormal regions and critical indicators, ensuring evidence-based, trustworthy diagnosis. Our code is available at https://github.com/tdlhl/RAD.

Related papers

MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z)
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z)
A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making [49.048767633316764]
KAMAC is a knowledge-driven Adaptive Multi-Agent Collaboration framework.<n>It enables agents to dynamically form and expand expert teams based on the evolving diagnostic context.<n> Experiments on two real-world medical benchmarks demonstrate that KAMAC significantly outperforms both single-agent and advanced multi-agent methods.
arXiv Detail & Related papers (2025-09-18T14:33:36Z)
Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI [20.623198882452986]
We have developed a strategy to leverage radiologists' domain knowledge to automate the diagnosis of carotid plaque vulnerability.<n>This method excels in harnessing cross-modality prior knowledge from limited image annotations and radiology reports within training data.
arXiv Detail & Related papers (2025-09-15T13:38:35Z)
Expert-Guided Explainable Few-Shot Learning for Medical Image Diagnosis [2.7946918847372277]
We propose an expert-guided explainable few-shot learning framework that integrates radiologist-provided regions of interest into model training.<n>We evaluate our framework on two distinct datasets: BraTS (MRI) and VinDr-CXR (Chest X-ray)<n>Our findings demonstrate the effectiveness of incorporating expert-guided attention supervision to bridge the gap between performance and interpretability in few-shot medical image diagnosis.
arXiv Detail & Related papers (2025-09-08T05:31:37Z)
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning [52.12425911708585]
Deep-DxSearch is an agentic RAG system trained end-to-end with reinforcement learning (RL)<n>In Deep-DxSearch, we first construct a large-scale medical retrieval corpus comprising patient records and reliable medical knowledge sources.<n> Experiments demonstrate that our end-to-end RL training framework consistently outperforms prompt-engineering and training-free RAG approaches.
arXiv Detail & Related papers (2025-08-21T17:42:47Z)
MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method.<n>It is designed to enhance interpretability and accuracy in biomedical imaging inquiries.<n> Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z)
MvKeTR: Chest CT Report Generation with Multi-View Perception and Knowledge Enhancement [1.6355783973385114]
Multi-view perception knowledge-enhanced TansfoRmer (MvKeTR)<n>MVPA with view-aware attention is proposed to synthesize diagnostic information from multiple anatomical views effectively.<n>Cross-Modal Knowledge Enhancer (CMKE) is devised to retrieve the most similar reports based on the query volume.
arXiv Detail & Related papers (2024-11-27T12:58:23Z)
Leveraging Expert Input for Robust and Explainable AI-Assisted Lung Cancer Detection in Chest X-rays [2.380494879018844]
This study examines the interpretability and robustness of a high-performing lung cancer detection model based on InceptionV3.<n>We develop ClinicXAI, an expert-driven approach leveraging the concept bottleneck methodology.
arXiv Detail & Related papers (2024-03-28T14:15:13Z)
Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images [40.52487429030841]
We propose Knowledge-enhanced Auto Diagnosis (KAD) to guide vision-supervised pre-training using paired chest X-rays and radiology reports. We evaluate KAD on four external X-ray datasets and demonstrate that its zero-shot performance is superior to that of fully-language models.
arXiv Detail & Related papers (2023-02-27T18:53:10Z)
Informing clinical assessment by contextualizing post-hoc explanations of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state. We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability. Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z)
Inheritance-guided Hierarchical Assignment for Clinical Automatic Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.