Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data
- URL: http://arxiv.org/abs/2311.11097v1
- Date: Sat, 18 Nov 2023 14:52:26 GMT
- Title: Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data
- Authors: Nurbanu Aksoy, Nishant Ravikumar and Alejandro F Frangi
- Abstract summary: This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
- Score: 55.17268696112258
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Medical image interpretation is central to most clinical applications such as
disease diagnosis, treatment planning, and prognostication. In clinical
practice, radiologists examine medical images and manually compile their
findings into reports, which can be a time-consuming process. Automated
approaches to radiology report generation, therefore, can reduce radiologist
workload and improve efficiency in the clinical pathway. While recent
deep-learning approaches for automated report generation from medical images
have seen some success, most studies have relied on image-derived features
alone, ignoring non-imaging patient data. Although a few studies have included
the word-level contexts along with the image, the use of patient demographics
is still unexplored. This paper proposes a novel multi-modal transformer
network that integrates chest x-ray (CXR) images and associated patient
demographic information, to synthesise patient-specific radiology reports. The
proposed network uses a convolutional neural network to extract visual features
from CXRs and a transformer-based encoder-decoder network that combines the
visual features with semantic text embeddings of patient demographic
information, to synthesise full-text radiology reports. Data from two public
databases were used to train and evaluate the proposed approach. CXRs and
reports were extracted from the MIMIC-CXR database and combined with
corresponding patients' data MIMIC-IV. Based on the evaluation metrics used
including patient demographic information was found to improve the quality of
reports generated using the proposed approach, relative to a baseline network
trained using CXRs alone. The proposed approach shows potential for enhancing
radiology report generation by leveraging rich patient metadata and combining
semantic text embeddings derived thereof, with medical image-derived visual
features.
Related papers
- D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions [8.50767187405446]
We propose D-Rax -- a domain-specific, conversational, radiologic assistance tool.
We enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting.
We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations.
arXiv Detail & Related papers (2024-07-02T18:43:10Z) - The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It [12.61239008314719]
This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation.
Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we incorporate detailed patient information such as a vital signsperiodic, medications, and clinical history to enhance diagnostic accuracy.
arXiv Detail & Related papers (2024-06-19T03:25:31Z) - Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation [10.46031380503486]
We introduce a novel method, textbfStructural textbfEntities extraction and patient indications textbfIncorporation (SEI) for chest X-ray report generation.
We employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports.
We propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications.
arXiv Detail & Related papers (2024-05-23T01:29:47Z) - RaDialog: A Large Vision-Language Model for Radiology Report Generation
and Conversational Assistance [53.20640629352422]
Conversational AI tools can generate and discuss clinically correct radiology reports for a given medical image.
RaDialog is the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog.
Our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions.
arXiv Detail & Related papers (2023-11-30T16:28:40Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Cross-modal Memory Networks for Radiology Report Generation [30.13916304931662]
Cross-modal memory networks (CMN) are proposed to enhance the encoder-decoder framework for radiology report generation.
Our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.
arXiv Detail & Related papers (2022-04-28T02:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.