Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration
- URL: http://arxiv.org/abs/2505.06898v1
- Date: Sun, 11 May 2025 08:32:01 GMT
- Title: Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration
- Authors: Honglong Yang, Shanshan Song, Yi Qin, Lehan Wang, Haonan Wang, Xinpeng Ding, Qixiang Zhang, Bodong Du, Xiaomeng Li,
- Abstract summary: Generalist Medical AI (GMAI) systems have demonstrated expert-level performance in biomedical perception tasks.<n>Here, we present XMedGPT, a clinician-centric, multi-modal AI assistant that integrates textual and visual interpretability.<n>We validate XMedGPT across four pillars: multi-modal interpretability, uncertainty quantification, and prognostic modeling, and rigorous benchmarking.
- Score: 17.11245701879749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalist Medical AI (GMAI) systems have demonstrated expert-level performance in biomedical perception tasks, yet their clinical utility remains limited by inadequate multi-modal explainability and suboptimal prognostic capabilities. Here, we present XMedGPT, a clinician-centric, multi-modal AI assistant that integrates textual and visual interpretability to support transparent and trustworthy medical decision-making. XMedGPT not only produces accurate diagnostic and descriptive outputs, but also grounds referenced anatomical sites within medical images, bridging critical gaps in interpretability and enhancing clinician usability. To support real-world deployment, we introduce a reliability indexing mechanism that quantifies uncertainty through consistency-based assessment via interactive question-answering. We validate XMedGPT across four pillars: multi-modal interpretability, uncertainty quantification, and prognostic modeling, and rigorous benchmarking. The model achieves an IoU of 0.703 across 141 anatomical regions, and a Kendall's tau-b of 0.479, demonstrating strong alignment between visual rationales and clinical outcomes. For uncertainty estimation, it attains an AUC of 0.862 on visual question answering and 0.764 on radiology report generation. In survival and recurrence prediction for lung and glioma cancers, it surpasses prior leading models by 26.9%, and outperforms GPT-4o by 25.0%. Rigorous benchmarking across 347 datasets covers 40 imaging modalities and external validation spans 4 anatomical systems confirming exceptional generalizability, with performance gains surpassing existing GMAI by 20.7% for in-domain evaluation and 16.7% on 11,530 in-house data evaluation. Together, XMedGPT represents a significant leap forward in clinician-centric AI integration, offering trustworthy and scalable support for diverse healthcare applications.
Related papers
- Multimodal Attention-Aware Fusion for Diagnosing Distal Myopathy: Evaluating Model Interpretability and Clinician Trust [19.107204920543676]
Distal myopathy represents a heterogeneous group of skeletal muscle disorders with broad clinical manifestations.<n>We propose a novel multimodal attention-aware fusion architecture that combines features extracted from two distinct deep learning models.<n>Our approach integrates these features through an attention gate mechanism, enhancing both predictive performance and interpretability.
arXiv Detail & Related papers (2025-08-02T11:08:55Z) - Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z) - ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding [3.5568372183159203]
ReXVQA is the largest and most comprehensive benchmark for visual question answering (VQA) in chest radiology.<n>It comprises approximately 696,000 questions paired with 160,000 chest X-rays studies across training, validation, and test sets.<n>We evaluate eight state-of-the-art multimodal large language models, including MedGemma-4B-it, Qwen2.5-VL, Janus-Pro-7B, and Eagle2-9B.
arXiv Detail & Related papers (2025-06-04T18:11:59Z) - MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility [38.33724495011223]
We introduce MedOrch, a novel framework that orchestrates specialized tools and reasoning agents to provide comprehensive medical decision support.<n>We evaluate MedOrch across three medical applications: Alzheimer's disease diagnosis, chest X-ray interpretation, and medical visual question answering.
arXiv Detail & Related papers (2025-05-30T21:13:12Z) - CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models [1.7042756021131187]
This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system.<n>CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification.<n>RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports.
arXiv Detail & Related papers (2025-04-29T16:14:55Z) - AI-Driven MRI Spine Pathology Detection: A Comprehensive Deep Learning Approach for Automated Diagnosis in Diverse Clinical Settings [0.0]
This study presents the development of an autonomous AI system for MRI spine pathology detection, trained on a dataset of 2 million MRI spine scans.<n>The dataset is balanced across age groups, genders, and scanner manufacturers to ensure robustness and adaptability.<n>The system was deployed across 13 major healthcare enterprises in India, encompassing diagnostic centers, large hospitals, and government facilities.
arXiv Detail & Related papers (2025-03-26T08:33:03Z) - 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark [0.29987253996125257]
Large Vision-Language Models (LVLMs) are being explored for applications in telemedicine, yet their ability to engage with diverse patient behaviors remains underexplored.<n>We introduce 3MDBench, an open-source evaluation framework designed to assess LLM-driven medical consultations.<n>The benchmark integrates textual and image-based patient data across 34 common diagnoses, mirroring real-world telemedicine interactions.
arXiv Detail & Related papers (2025-03-26T07:32:05Z) - A Scalable Approach to Benchmarking the In-Conversation Differential Diagnostic Accuracy of a Health AI [0.0]
This study introduces a scalable benchmarking methodology for assessing health AI systems.<n>Our methodology employs 400 validated clinical vignettes across 14 medical specialties, using AI-powered patient actors to simulate realistic clinical interactions.<n>August achieved a top-one diagnostic accuracy of 81.8% (327/400 cases) and a top-two accuracy of 85.0% (340/400 cases), significantly outperforming traditional symptom checkers.
arXiv Detail & Related papers (2024-12-17T05:02:33Z) - Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare [0.2302001830524133]
Biased AI-generated medical advice and misdiagnoses can jeopardize patient safety.
This study introduces new resources designed to promote ethical and precise AI in healthcare.
arXiv Detail & Related papers (2024-10-09T06:00:05Z) - GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in
Artificial Intelligence [79.038671794961]
We launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution.
Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK.
arXiv Detail & Related papers (2021-11-18T00:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.