Reasoning Visual Language Model for Chest X-Ray Analysis
- URL: http://arxiv.org/abs/2510.23968v2
- Date: Thu, 30 Oct 2025 00:14:35 GMT
- Title: Reasoning Visual Language Model for Chest X-Ray Analysis
- Authors: Andriy Myronenko, Dong Yang, Baris Turkbey, Mariam Aboian, Sena Azamat, Esra Akcicek, Hongxu Yin, Pavlo Molchanov, Marc Edgar, Yufan He, Pengfei Guo, Yucheng Tang, Daguang Xu,
- Abstract summary: We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation.<n>Inspired by reasoning-first training paradigms, our approach is designed to learn how experts reason, not just what they conclude.<n>We release code and the model NV-Reason-CXR-3B to support community progress toward trustworthy, explainable AI in chest radiography.
- Score: 30.318629424154206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language models (VLMs) have shown strong promise for medical image analysis, but most remain opaque, offering predictions without the transparent, stepwise reasoning clinicians rely on. We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation. Inspired by reasoning-first training paradigms, our approach is designed to learn how experts reason, not just what they conclude, by aligning intermediate steps with observable image evidence and radiology workflow. Beyond accuracy, the explicit reasoning traces support clinical auditability: they reveal why a conclusion was reached, which alternatives were considered, and where uncertainty remains, enabling quality assurance, error analysis, and safer human-AI collaboration. Our model couples high-fidelity visual encoding with a two-stage training recipe: a reasoning-style supervised fine-tuning (SFT) followed by reinforcement learning (RL) that uses verifiable rewards over a list of X-ray abnormalities. The model outputs reasoning that mirrors radiologists systematic thought process, uncertainty, and differential diagnosis. In out-of-distribution evaluation, the approach achieves competitive multi-label classification while improving interpretability. In a reader study with expert radiologists, full reasoning traces increased confidence, supported error auditing, and reduced time to finalize reports. We release code and the model NV-Reason-CXR-3B to support community progress toward trustworthy, explainable AI in chest radiography and other medical imaging tasks where reasoning quality is as critical as prediction quality.
Related papers
- Making medical vision-language models think causally across modalities with retrieval-augmented cross-modal reasoning [16.243806723551454]
Medical vision-language models (VLMs) achieve strong performance in diagnostic reporting and image-text alignment.<n>Their underlying reasoning mechanisms remain fundamentally correlational, exhibiting reliance on superficial statistical associations.<n>We propose Multimodal Causal Retrieval-Augmented Generation, a framework that integrates causal inference principles with multimodal retrieval.
arXiv Detail & Related papers (2026-01-26T11:03:00Z) - AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning [73.50200033931148]
We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists.<n>By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback.<n> Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations.
arXiv Detail & Related papers (2026-01-23T11:59:13Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - Brittleness and Promise: Knowledge Graph Based Reward Modeling for Diagnostic Reasoning [8.35131510062609]
Large language models (LLMs) show promise for diagnostic reasoning but often lack reliable, knowledge grounded inference.<n>We treat the LLM as a reward model of KG reasoning paths, where the model learns to judge whether a candidate path leads to correct diagnosis for a given patient input.<n>Our finding provides the first systematic assessment of "reward model style" reasoning over clinical KGs.
arXiv Detail & Related papers (2025-09-22T18:39:09Z) - End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning [52.12425911708585]
Deep-DxSearch is an agentic RAG system trained end-to-end with reinforcement learning (RL)<n>In Deep-DxSearch, we first construct a large-scale medical retrieval corpus comprising patient records and reliable medical knowledge sources.<n> Experiments demonstrate that our end-to-end RL training framework consistently outperforms prompt-engineering and training-free RAG approaches.
arXiv Detail & Related papers (2025-08-21T17:42:47Z) - X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning [0.0]
We propose X-Ray-CoT (Chest X-Ray Chain-of-Thought), a novel framework for intelligent chest X-ray diagnosis and interpretable report generation.<n>X-Ray-CoT simulates human radiologists' "chain-of-thought" by first extracting multi-modal features and visual concepts.<n>It achieves competitive quantitative performance, with a Balanced Accuracy of 80.52% and F1 score of 78.65% for disease diagnosis.
arXiv Detail & Related papers (2025-08-17T18:00:41Z) - Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning [29.84956540178252]
Reasoning is a critical frontier for advancing medical image analysis.<n>We introduce MedVLM-R1, a medical VLM that explicitly generates natural language reasoning.<n>MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI, CT, and X-ray benchmarks.
arXiv Detail & Related papers (2025-02-26T23:57:34Z) - VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback [1.5839621757142595]
We propose a novel framework designed to enhance the semantic alignment and localization accuracy of AI-generated medical reports.<n>By comparing features between the original and generated images, we introduce a dual-scoring system.<n>This approach significantly outperforms existing methods, achieving state-of-the-art results in pathology localization and text-to-image alignment.
arXiv Detail & Related papers (2025-01-29T16:02:16Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Interpretable Vertebral Fracture Diagnosis [69.68641439851777]
Black-box neural network models learn clinically relevant features for fracture diagnosis.
This work identifies the concepts networks use for vertebral fracture diagnosis in CT images.
arXiv Detail & Related papers (2022-03-30T13:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.