X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning
- URL: http://arxiv.org/abs/2508.12455v1
- Date: Sun, 17 Aug 2025 18:00:41 GMT
- Title: X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning
- Authors: Chee Ng, Liliang Sun, Shaoqing Tang,
- Abstract summary: We propose X-Ray-CoT (Chest X-Ray Chain-of-Thought), a novel framework for intelligent chest X-ray diagnosis and interpretable report generation.<n>X-Ray-CoT simulates human radiologists' "chain-of-thought" by first extracting multi-modal features and visual concepts.<n>It achieves competitive quantitative performance, with a Balanced Accuracy of 80.52% and F1 score of 78.65% for disease diagnosis.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chest X-ray imaging is crucial for diagnosing pulmonary and cardiac diseases, yet its interpretation demands extensive clinical experience and suffers from inter-observer variability. While deep learning models offer high diagnostic accuracy, their black-box nature hinders clinical adoption in high-stakes medical settings. To address this, we propose X-Ray-CoT (Chest X-Ray Chain-of-Thought), a novel framework leveraging Vision-Language Large Models (LVLMs) for intelligent chest X-ray diagnosis and interpretable report generation. X-Ray-CoT simulates human radiologists' "chain-of-thought" by first extracting multi-modal features and visual concepts, then employing an LLM-based component with a structured Chain-of-Thought prompting strategy to reason and produce detailed natural language diagnostic reports. Evaluated on the CORDA dataset, X-Ray-CoT achieves competitive quantitative performance, with a Balanced Accuracy of 80.52% and F1 score of 78.65% for disease diagnosis, slightly surpassing existing black-box models. Crucially, it uniquely generates high-quality, explainable reports, as validated by preliminary human evaluations. Our ablation studies confirm the integral role of each proposed component, highlighting the necessity of multi-modal fusion and CoT reasoning for robust and transparent medical AI. This work represents a significant step towards trustworthy and clinically actionable AI systems in medical imaging.
Related papers
- A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice [83.11942224668127]
Janus-Pro-CXR (1B) is a chest X-ray interpretation system based on DeepSeek Janus-Pro model.<n>Our system outperforms state-of-the-art X-ray report generation models in automated report generation.
arXiv Detail & Related papers (2025-12-23T13:26:13Z) - A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation [15.331803613974365]
We propose a novel dual-stage disease-aware framework for chest X-ray report generation.<n>In Stage1, our model learns Disease-Aware Semantic Tokens (DASTs) corresponding to specific pathology categories.<n>In Stage2, we introduce a Disease-Visual Attention Fusion module to integrate disease-aware representations with visual features.
arXiv Detail & Related papers (2025-11-15T15:31:51Z) - Reasoning Visual Language Model for Chest X-Ray Analysis [30.318629424154206]
We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation.<n>Inspired by reasoning-first training paradigms, our approach is designed to learn how experts reason, not just what they conclude.<n>We release code and the model NV-Reason-CXR-3B to support community progress toward trustworthy, explainable AI in chest radiography.
arXiv Detail & Related papers (2025-10-28T00:48:00Z) - An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection [55.35661671061754]
Tuberculosis remains a critical global health issue, particularly in resource-limited and remote areas.<n>We propose a framework which enhances disease and symptom detection on chest X-rays by integrating two supervised heads and a self-supervised head.<n>Our model achieves an accuracy of 98.85% for distinguishing between COVID-19, tuberculosis, and normal cases, and a macro-F1 score of 90.09% for multilabel symptom detection.
arXiv Detail & Related papers (2025-10-21T17:18:55Z) - A Foundation Model for Chest X-ray Interpretation with Grounded Reasoning via Online Reinforcement Learning [41.27625400846057]
DeepMedix-R1 is a holistic medical FM for chest X-ray (CXR) interpretation.<n>It generates both an answer and reasoning steps tied to the image's local regions for each query.
arXiv Detail & Related papers (2025-09-04T06:00:04Z) - RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z) - CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models [1.7042756021131187]
This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system.<n>CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification.<n>RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports.
arXiv Detail & Related papers (2025-04-29T16:14:55Z) - Leveraging Expert Input for Robust and Explainable AI-Assisted Lung Cancer Detection in Chest X-rays [2.380494879018844]
This study examines the interpretability and robustness of a high-performing lung cancer detection model based on InceptionV3.<n>We develop ClinicXAI, an expert-driven approach leveraging the concept bottleneck methodology.
arXiv Detail & Related papers (2024-03-28T14:15:13Z) - Large Model driven Radiology Report Generation with Clinical Quality
Reinforcement Learning [16.849933628738277]
Radiology report generation (RRG) has attracted significant attention due to its potential to reduce the workload of radiologists.
This paper introduces a novel RRG method, textbfLM-RRG, that integrates large models (LMs) with clinical quality reinforcement learning.
Experiments on the MIMIC-CXR and IU-Xray datasets demonstrate the superiority of our method over the state of the art.
arXiv Detail & Related papers (2024-03-11T13:47:11Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.