CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models
- URL: http://arxiv.org/abs/2504.20898v2
- Date: Sun, 04 May 2025 22:38:21 GMT
- Title: CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models
- Authors: Hasan Md Tusfiqur Alam, Devansh Srivastav, Abdulrahman Mohamed Selim, Md Abdul Kadir, Md Moktadirul Hoque Shuvo, Daniel Sonntag,
- Abstract summary: This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system.<n>CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification.<n>RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports.
- Score: 1.7042756021131187
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Advancements in generative Artificial Intelligence (AI) hold great promise for automating radiology workflows, yet challenges in interpretability and reliability hinder clinical adoption. This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system to bridge AI performance with clinical explainability. CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification. Meanwhile, the RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports. Our demonstration showcases the system's ability to deliver interpretable predictions, mitigate hallucinations, and generate high-quality, tailored reports with an interactive interface addressing accuracy, trust, and usability challenges. This framework provides a pathway to improving diagnostic consistency and empowering radiologists with actionable insights.
Related papers
- AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning [73.50200033931148]
We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists.<n>By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback.<n> Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations.
arXiv Detail & Related papers (2026-01-23T11:59:13Z) - SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction [49.355973075150075]
We introduce SurvAgent, the first hierarchical chain-of-thought (CoT)-enhanced multi-agent system for multimodal survival prediction.<n>SurvAgent consists of two stages: WSI-Gene CoT-Enhanced Case Bank Construction employs hierarchical analysis through Low-Magnification Screening, Cross-Modal Similarity-Aware Patch Mining, and Confidence-Aware Patch Mining for pathology images.<n>Dichotomy-Based Multi-Expert Agent Inference retrieves similar cases via RAG and integrates multimodal reports with expert predictions through progressive interval refinement.
arXiv Detail & Related papers (2025-11-20T18:41:44Z) - CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation [62.0150409256153]
We propose CXRAgent, a director-orchestrated, multi-stage agent for CXR interpretation.<n>The agent strategically orchestrates a set of CXR-analysis tools, with outputs normalized and verified by the Evidence-driven Validator.<n>Experiments on various CXR interpretation tasks show that CXRAgent delivers strong performance, providing visual evidence and generalizes well to clinical tasks of different complexity.
arXiv Detail & Related papers (2025-10-24T10:31:30Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - Intelligent Healthcare Imaging Platform: A VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation [0.0]
This work presents an intelligent multimodal framework for medical image analysis that leverages Vision-Language Models (VLMs)<n>The framework integrates Google Gemini 2.5 Flash for automated tumor detection and clinical report generation across multiple imaging modalities including CT, MRI, X-ray, and Ultrasound.
arXiv Detail & Related papers (2025-09-16T23:15:44Z) - X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning [0.0]
We propose X-Ray-CoT (Chest X-Ray Chain-of-Thought), a novel framework for intelligent chest X-ray diagnosis and interpretable report generation.<n>X-Ray-CoT simulates human radiologists' "chain-of-thought" by first extracting multi-modal features and visual concepts.<n>It achieves competitive quantitative performance, with a Balanced Accuracy of 80.52% and F1 score of 78.65% for disease diagnosis.
arXiv Detail & Related papers (2025-08-17T18:00:41Z) - RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z) - Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation [0.0]
This paper introduces CXR-PathFinder, a novel Large Language Model (LLM)-centric foundation model specifically engineered for automated chest X-ray (CXR) report generation.<n>We propose a unique training paradigm, Clinician-Guided Adrial Fine-Tuning (CGAFT), which meticulously integrates expert clinical feedback into an adversarial learning framework.<n>Our experiments demonstrate that CXR-PathFinder significantly outperforms existing state-of-the-art medical vision-language models across various quantitative metrics.
arXiv Detail & Related papers (2025-06-01T18:47:49Z) - A Multimodal Multi-Agent Framework for Radiology Report Generation [2.1477122604204433]
Radiology report generation (RRG) aims to automatically produce diagnostic reports from medical images.<n>We propose a multimodal multi-agent framework for RRG that aligns with the stepwise clinical reasoning workflow.
arXiv Detail & Related papers (2025-05-14T20:28:04Z) - Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings [44.99833362998488]
This paper introduces CXR-TextInter, a novel framework that repurposes powerful text-centric language models for chest X-rays interpretation.<n>We augment this LLM-centric approach with an integrated medical knowledge module to enhance clinical reasoning.<n>Our work validates an alternative paradigm for medical image AI, showcasing the potential of harnessing advanced LLM capabilities.
arXiv Detail & Related papers (2025-05-03T06:18:12Z) - VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback [1.5839621757142595]
We propose a novel framework designed to enhance the semantic alignment and localization accuracy of AI-generated medical reports.
By comparing features between the original and generated images, we introduce a dual-scoring system.
This approach significantly outperforms existing methods, achieving state-of-the-art results in pathology localization and text-to-image alignment.
arXiv Detail & Related papers (2025-01-29T16:02:16Z) - Hybrid Interpretable Deep Learning Framework for Skin Cancer Diagnosis: Integrating Radial Basis Function Networks with Explainable AI [1.1049608786515839]
Skin cancer is one of the most prevalent and potentially life-threatening diseases worldwide.<n>We propose a novel hybrid deep learning framework that integrates convolutional neural networks (CNNs) with Radial Basis Function (RBF) Networks to achieve high classification accuracy and enhanced interpretability.
arXiv Detail & Related papers (2025-01-24T19:19:02Z) - Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG [1.9374282535132377]
This study enhances interpretability in Chest X-ray (CXR) classification by using concept bottleneck models (CBMs) and a multi-agent Retrieval-Augmented Generation (RAG) system for report generation.<n>By modeling relationships between visual features and clinical concepts, we create interpretable concept vectors that guide a multi-agent RAG system to generate radiology reports.
arXiv Detail & Related papers (2024-12-20T17:33:50Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report [4.340464264725625]
We introduce a novel Multi-Modal Contrastive Pre-training Framework that synergistically combines X-rays, electrocardiograms (ECGs) and radiology/cardiology reports.
We utilize LoRA-Peft to significantly reduce trainable parameters in the LLM and incorporate recent linear attention dropping strategy in the Vision Transformer(ViT) for smoother attention.
To the best of our knowledge, we are the first to propose an integrated model that combines X-ray, ECG, and Radiology/Cardiology Report with this approach.
arXiv Detail & Related papers (2024-10-21T17:42:41Z) - Analyzing the Effect of $k$-Space Features in MRI Classification Models [0.0]
We have developed an explainable AI methodology tailored for medical imaging.
We employ a Convolutional Neural Network (CNN) that analyzes MRI scans across both image and frequency domains.
This approach not only enhances early training efficiency but also deepens our understanding of how additional features impact the model predictions.
arXiv Detail & Related papers (2024-09-20T15:43:26Z) - AutoRG-Brain: Grounded Report Generation for Brain MRI [57.22149878985624]
Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports.
This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies.
We initiate a series of work on grounded Automatic Report Generation (AutoRG)
This system supports the delineation of brain structures, the localization of anomalies, and the generation of well-organized findings.
arXiv Detail & Related papers (2024-07-23T17:50:00Z) - RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance [49.04445246682948]
Conversational AI tools can generate and discuss clinically correct radiology reports for a given medical image.<n>RaDialog is the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog.<n>Our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions.
arXiv Detail & Related papers (2023-11-30T16:28:40Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.