Related papers: RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

URL: http://arxiv.org/abs/2501.07525v1
Date: Mon, 13 Jan 2025 17:55:32 GMT
Title: RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
Authors: Difei Gu, Yunhe Gao, Yang Zhou, Mu Zhou, Dimitris Metaxas,
Abstract summary: RadAlign is a novel framework that combines the predictive accuracy of vision-language models with the reasoning capabilities of large language models.<n>Our framework maintains strong clinical interpretability while reducing hallucinations, advancing automated medical imaging and report analysis through integrated predictive and generative AI.
Score: 10.67889367763112
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Automated chest radiographs interpretation requires both accurate disease classification and detailed radiology report generation, presenting a significant challenge in the clinical workflow. Current approaches either focus on classification accuracy at the expense of interpretability or generate detailed but potentially unreliable reports through image captioning techniques. In this study, we present RadAlign, a novel framework that combines the predictive accuracy of vision-language models (VLMs) with the reasoning capabilities of large language models (LLMs). Inspired by the radiologist's workflow, RadAlign first employs a specialized VLM to align visual features with key medical concepts, achieving superior disease classification with an average AUC of 0.885 across multiple diseases. These recognized medical conditions, represented as text-based concepts in the aligned visual-language space, are then used to prompt LLM-based report generation. Enhanced by a retrieval-augmented generation mechanism that grounds outputs in similar historical cases, RadAlign delivers superior report quality with a GREEN score of 0.678, outperforming state-of-the-art methods' 0.634. Our framework maintains strong clinical interpretability while reducing hallucinations, advancing automated medical imaging and report analysis through integrated predictive and generative AI. Code is available at https://github.com/difeigu/RadAlign.

Related papers

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs [4.273291010923853]
Large language models (LLMs) have demonstrated remarkable performance across various domains.<n>We propose Optimal Transport-Driven Radiology Report Generation (OTDRG) to align image features with disease labels extracted from reports.<n>OTDRG achieves state-of-the-art performance in both natural language generation (NLG) and clinical efficacy (CE) metrics.
arXiv Detail & Related papers (2025-07-05T05:48:48Z)
Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG [1.9374282535132377]
This study enhances interpretability in Chest X-ray (CXR) classification by using concept bottleneck models (CBMs) and a multi-agent Retrieval-Augmented Generation (RAG) system for report generation. By modeling relationships between visual features and clinical concepts, we create interpretable concept vectors that guide a multi-agent RAG system to generate radiology reports.
arXiv Detail & Related papers (2024-12-20T17:33:50Z)
Resource-Efficient Medical Report Generation using Large Language Models [3.2627279988912194]
Medical report generation is the task of automatically writing radiology reports for chest X-ray images. We propose a new framework leveraging vision-enabled Large Language Models (LLM) for the task of medical report generation.
arXiv Detail & Related papers (2024-10-21T05:08:18Z)
Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios. Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z)
Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation [10.46031380503486]
We introduce a novel method, textbfStructural textbfEntities extraction and patient indications textbfIncorporation (SEI) for chest X-ray report generation. We employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports. We propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications.
arXiv Detail & Related papers (2024-05-23T01:29:47Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning. In detail, the fundamental structure of our graph is pre-constructed from general knowledge. Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z)
Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance. Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas. We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z)
A Self-Guided Framework for Radiology Report Generation [10.573538773141715]
A self-guided framework (SGF) is developed to generate medical reports with annotated disease labels. SGF uses unsupervised and supervised deep learning methods to mimic the process of human learning and writing. Our results highlight the capacity of the proposed framework to distinguish fined-grained visual details between words.
arXiv Detail & Related papers (2022-06-19T11:09:27Z)
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG) CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure. Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z)
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation [3.3978173451092437]
Radiology report generation aims at generating descriptive text from radiology images automatically. A typical setting consists of training encoder-decoder models on image-report pairs with a cross entropy loss. We propose a novel weakly supervised contrastive loss for medical report generation.
arXiv Detail & Related papers (2021-09-25T00:06:23Z)
Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns. ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.