LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation
- URL: http://arxiv.org/abs/2411.16523v1
- Date: Mon, 25 Nov 2024 16:10:05 GMT
- Title: LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation
- Authors: Steven Song, Anirudh Subramanyam, Irene Madejski, Robert L. Grossman,
- Abstract summary: We propose Label Boosted Retrieval Augmented Generation (LaB-RAG) to generate radiology reports.
We show that LaB-RAG achieves better results across natural language and radiology language metrics compared with other retrieval-based RRG methods.
We critique the use of a popular RRG metric, arguing it is possible to artificially inflate its results without true data-leakage.
- Score: 1.1029725477806065
- License:
- Abstract: In the current paradigm of image captioning, deep learning models are trained to generate text from image embeddings of latent features. We challenge the assumption that these latent features ought to be high-dimensional vectors which require model fine tuning to handle. Here we propose Label Boosted Retrieval Augmented Generation (LaB-RAG), a text-based approach to image captioning that leverages image descriptors in the form of categorical labels to boost standard retrieval augmented generation (RAG) with pretrained large language models (LLMs). We study our method in the context of radiology report generation (RRG), where the task is to generate a clinician's report detailing their observations from a set of radiological images, such as X-rays. We argue that simple linear classifiers over extracted image embeddings can effectively transform X-rays into text-space as radiology-specific labels. In combination with standard RAG, we show that these derived text labels can be used with general-domain LLMs to generate radiology reports. Without ever training our generative language model or image feature encoder models, and without ever directly "showing" the LLM an X-ray, we demonstrate that LaB-RAG achieves better results across natural language and radiology language metrics compared with other retrieval-based RRG methods, while attaining competitive results compared to other fine-tuned vision-language RRG models. We further present results of our experiments with various components of LaB-RAG to better understand our method. Finally, we critique the use of a popular RRG metric, arguing it is possible to artificially inflate its results without true data-leakage.
Related papers
- PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation [4.925253788789898]
Grounded radiology report generation (GRRG) includes the localisation of individual findings on the image.
Currently, there are no manually annotated chest X-ray (CXR) datasets to train GRRG models.
We present a dataset called PadChest-GR (Grounded-Reporting) derived from PadChest aimed at training GRRG models for CXR images.
arXiv Detail & Related papers (2024-11-07T19:06:17Z) - R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation [7.4871243017824165]
This paper proposes a novel context-guided efficient X-ray medical report generation framework.
Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model.
arXiv Detail & Related papers (2024-08-19T07:15:11Z) - SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models [9.390882250428305]
Radiology Report Generation (R2Gen) demonstrates how Multi-modal Large Language Models (MLLMs) can automate the creation of accurate and coherent radiological reports.
Existing methods often hallucinate details in text-based reports that don't accurately reflect the image content.
We introduce a novel strategy, which improves the R2Gen task by integrating a self-refining mechanism into the MLLM framework.
arXiv Detail & Related papers (2024-04-27T13:46:23Z) - Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval [50.72924579220149]
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.
Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image.
We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data.
arXiv Detail & Related papers (2024-04-23T21:00:22Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray
Report Generation [3.6664023341224827]
Contrastive X-Ray REport Match (X-REM) is a novel retrieval-based radiology report generation module.
X-REM uses an image-text matching score to measure the similarity of a chest X-ray image and radiology report for report retrieval.
arXiv Detail & Related papers (2023-03-29T04:00:47Z) - Radiomics-Guided Global-Local Transformer for Weakly Supervised
Pathology Localization in Chest X-Rays [65.88435151891369]
Radiomics-Guided Transformer (RGT) fuses textitglobal image information with textitlocal knowledge-guided radiomics information.
RGT consists of an image Transformer branch, a radiomics Transformer branch, and fusion layers that aggregate image and radiomic information.
arXiv Detail & Related papers (2022-07-10T06:32:56Z) - Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary
Edema Assessment [39.60171837961607]
We develop a neural network model that is trained on both images and free-text to assess pulmonary edema severity from chest radiographs at inference time.
Our experimental results suggest that the joint image-text representation learning improves the performance of pulmonary edema assessment.
arXiv Detail & Related papers (2020-08-22T17:28:39Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.