RadDiff: Describing Differences in Radiology Image Sets with Natural Language
- URL: http://arxiv.org/abs/2601.03733v1
- Date: Wed, 07 Jan 2026 09:25:04 GMT
- Title: RadDiff: Describing Differences in Radiology Image Sets with Natural Language
- Authors: Xiaoxian Shen, Yuhui Zhang, Sahithi Ankireddy, Xiaohan Wang, Maya Varma, Henry Guo, Curtis Langlotz, Serena Yeung-Levy,
- Abstract summary: RadDiff is a multimodal agentic system that performs radiologist-style comparative reasoning.<n>On RadDiffBench, RadDiff achieves 47% accuracy, and 50% accuracy when guided by ground-truth reports.
- Score: 52.774737253784
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding how two radiology image sets differ is critical for generating clinical insights and for interpreting medical AI systems. We introduce RadDiff, a multimodal agentic system that performs radiologist-style comparative reasoning to describe clinically meaningful differences between paired radiology studies. RadDiff builds on a proposer-ranker framework from VisDiff, and incorporates four innovations inspired by real diagnostic workflows: (1) medical knowledge injection through domain-adapted vision-language models; (2) multimodal reasoning that integrates images with their clinical reports; (3) iterative hypothesis refinement across multiple reasoning rounds; and (4) targeted visual search that localizes and zooms in on salient regions to capture subtle findings. To evaluate RadDiff, we construct RadDiffBench, a challenging benchmark comprising 57 expert-validated radiology study pairs with ground-truth difference descriptions. On RadDiffBench, RadDiff achieves 47% accuracy, and 50% accuracy when guided by ground-truth reports, significantly outperforming the general-domain VisDiff baseline. We further demonstrate RadDiff's versatility across diverse clinical tasks, including COVID-19 phenotype comparison, racial subgroup analysis, and discovery of survival-related imaging features. Together, RadDiff and RadDiffBench provide the first method-and-benchmark foundation for systematically uncovering meaningful differences in radiological data.
Related papers
- RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - FoundDiff: Foundational Diffusion Model for Generalizable Low-Dose CT Denoising [55.04342933312839]
We propose FoundDiff, a foundational diffusion model for unified and generalizable low-dose computed tomography (CT) denoising.<n>FoundDiff employs a two-stage strategy: (i) dose-anatomy perception and (ii) adaptive denoising.<n>First, we develop a dose- and anatomy-aware contrastive language image pre-training model (DA-CLIP) to achieve robust dose and anatomy perception.<n>Second, we design a dose- and anatomy-aware diffusion model (DA-Diff) to perform adaptive and generalizable denoising.
arXiv Detail & Related papers (2025-08-24T11:03:56Z) - A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation [4.61181046331792]
We propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG.<n>To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference.<n>Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD.
arXiv Detail & Related papers (2025-08-13T07:32:28Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - Enhanced Knowledge Injection for Radiology Report Generation [21.937372129714884]
We propose an enhanced knowledge injection framework, which utilizes two branches to extract different types of knowledge.
By integrating this finer-grained and well-structured knowledge with the current image, we are able to leverage the multi-source knowledge gain to ultimately facilitate more accurate report generation.
arXiv Detail & Related papers (2023-11-01T09:50:55Z) - Act Like a Radiologist: Radiology Report Generation across Anatomical Regions [50.13206214694885]
X-RGen is a radiologist-minded report generation framework across six anatomical regions.
In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases.
We enhance the recognition capacity of the image encoder by analysing images and reports across various regions.
arXiv Detail & Related papers (2023-05-26T07:12:35Z) - Local Contrastive Learning for Medical Image Recognition [0.0]
Local Region Contrastive Learning (LRCLR) is a flexible fine-tuning framework that adds layers for significant image region selection and cross-modality interaction.
Our results on an external validation set of chest x-rays suggest that LRCLR identifies significant local image regions and provides meaningful interpretation against radiology text.
arXiv Detail & Related papers (2023-03-24T17:04:26Z) - Differential Diagnosis of Frontotemporal Dementia and Alzheimer's
Disease using Generative Adversarial Network [0.0]
Frontotemporal dementia and Alzheimer's disease are two common forms of dementia and are easily misdiagnosed as each other.
Differentiating between the two dementia types is crucial for determining disease-specific intervention and treatment.
Recent development of Deep-learning-based approaches in the field of medical image computing are delivering some of the best performance for many binary classification tasks.
arXiv Detail & Related papers (2021-09-12T22:40:50Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.