Medical Image Captioning via Generative Pretrained Transformers
- URL: http://arxiv.org/abs/2209.13983v1
- Date: Wed, 28 Sep 2022 10:27:10 GMT
- Title: Medical Image Captioning via Generative Pretrained Transformers
- Authors: Alexander Selivanov, Oleg Y. Rogov, Daniil Chesakov, Artem Shelmanov,
Irina Fedulova and Dmitry V. Dylov
- Abstract summary: We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
- Score: 57.308920993032274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The automatic clinical caption generation problem is referred to as proposed
model combining the analysis of frontal chest X-Ray scans with structured
patient information from the radiology records. We combine two language models,
the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive
radiology records. The proposed combination of these models generates a textual
summary with the essential information about pathologies found, their location,
and the 2D heatmaps localizing each pathology on the original X-Ray scans. The
proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and
the general-purpose MS-COCO. The results measured with the natural language
assessment metrics prove their efficient applicability to the chest X-Ray image
captioning.
Related papers
- 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation [10.46031380503486]
We introduce a novel method, textbfStructural textbfEntities extraction and patient indications textbfIncorporation (SEI) for chest X-ray report generation.
We employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports.
We propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications.
arXiv Detail & Related papers (2024-05-23T01:29:47Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - LIMITR: Leveraging Local Information for Medical Image-Text
Representation [17.102338932907294]
This paper focuses on chest X-ray images and their corresponding radiological reports.
It presents a new model that learns a joint X-ray image & report representation.
arXiv Detail & Related papers (2023-03-21T11:20:34Z) - Self adaptive global-local feature enhancement for radiology report
generation [10.958641951927817]
We propose a novel framework AGFNet to dynamically fuse the global and anatomy region feature to generate multi-grained radiology report.
Firstly, we extract important anatomy region features and global features of input Chest X-ray (CXR)
Then, with the region features and the global features as input, our proposed self-adaptive fusion gate module could dynamically fuse multi-granularity information.
Finally, the captioning generator generates the radiology reports through multi-granularity features.
arXiv Detail & Related papers (2022-11-21T11:50:42Z) - Learning Semi-Structured Representations of Radiology Reports [10.134080761449093]
Given a corpus of radiology reports, researchers are often interested in identifying a subset of reports describing a particular medical finding.
Recent studies proposed mapping free-text statements in radiology reports to semi-structured strings of terms taken from a limited vocabulary.
This paper aims to present an approach for the automatic generation of semi-structured representations of radiology reports.
arXiv Detail & Related papers (2021-12-20T18:53:41Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.