Related papers: RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

URL: http://arxiv.org/abs/2311.18681v1
Date: Thu, 30 Nov 2023 16:28:40 GMT
Title: RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance
Authors: Chantal Pellegrini, Ege \"Ozsoy, Benjamin Busam, Nassir Navab, Matthias Keicher
Abstract summary: Conversational AI tools can generate and discuss clinically correct radiology reports for a given medical image. RaDialog is the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. Our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions.
Score: 53.20640629352422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.

Related papers

UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation [31.72930277939111]
We propose to transfer representations from CLIP, a large-scale pre-trained vision-language model, to better capture cross-modal semantics between images and texts. To enable efficient adaptation, we introduce UniCrossAdapter, lightweight adapter modules that are incorporated into CLIP and fine-tuned on the target task.
arXiv Detail & Related papers (2025-03-20T08:28:53Z)
Designing a Robust Radiology Report Generation System [1.0878040851637998]
This paper outlines the design of a robust radiology report generation system by integrating different modules and highlighting best practices. We believe that these best practices could improve automatic radiology report generation, augment radiologists in decision making, and expedite diagnostic workflow.
arXiv Detail & Related papers (2024-11-02T06:38:04Z)
Resource-Efficient Medical Report Generation using Large Language Models [3.2627279988912194]
Medical report generation is the task of automatically writing radiology reports for chest X-ray images. We propose a new framework leveraging vision-enabled Large Language Models (LLM) for the task of medical report generation.
arXiv Detail & Related papers (2024-10-21T05:08:18Z)
D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions [8.50767187405446]
We propose D-Rax -- a domain-specific, conversational, radiologic assistance tool. We enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations.
arXiv Detail & Related papers (2024-07-02T18:43:10Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians. Recent studies have achieved promising results in automatic impression generation using large-scale medical text data. These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z)
Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z)
Cross-modal Memory Networks for Radiology Report Generation [30.13916304931662]
Cross-modal memory networks (CMN) are proposed to enhance the encoder-decoder framework for radiology report generation. Our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.
arXiv Detail & Related papers (2022-04-28T02:32:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.