GPT-4V Cannot Generate Radiology Reports Yet
- URL: http://arxiv.org/abs/2407.12176v4
- Date: Thu, 14 Nov 2024 21:34:59 GMT
- Title: GPT-4V Cannot Generate Radiology Reports Yet
- Authors: Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan,
- Abstract summary: GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing.
We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics.
- Score: 25.331936045860516
- License:
- Abstract: GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.
Related papers
- Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback [10.826651024680169]
Radiologists play a crucial role by translating medical images into medical reports.
While automated approaches using vision-language models (VLMs) show promise as assistants, they require exceptionally high accuracy.
We propose a scalable automated preference alignment technique for VLMs in radiology, focusing on chest X-ray (CXR) report generation.
arXiv Detail & Related papers (2024-10-09T16:07:11Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Pragmatic Radiology Report Generation [39.96409366755059]
We argue that when pneumonia is not found on a chest X-ray, should the report describe this negative observation or omit it?
We develop a framework to identify uninferable information from the image as a source of model hallucinations, and limit them by cleaning groundtruth reports.
arXiv Detail & Related papers (2023-11-28T19:00:03Z) - Exploring the Boundaries of GPT-4 in Radiology [46.30976153809968]
GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context.
For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
arXiv Detail & Related papers (2023-10-23T05:13:03Z) - Replace and Report: NLP Assisted Radiology Report Generation [31.309987297324845]
We propose a template-based approach to generate radiology reports from radiographs.
This is the first attempt to generate chest X-ray radiology reports by first creating small sentences for abnormal findings and then replacing them in the normal report template.
arXiv Detail & Related papers (2023-06-19T10:04:42Z) - DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis [54.93879264615525]
We propose DeltaNet to generate medical reports automatically.
DeltaNet employs three steps to generate a report.
We evaluate DeltaNet on a COVID-19 dataset, where DeltaNet outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-12T07:41:03Z) - Transfer learning with weak labels from radiology reports: application
to glioma change detection [0.2010294990327175]
We propose a combined use of weak labels (imprecise, but fast-to-create annotations) and Transfer Learning (TL)
Specifically, we explore inductive TL, where source and target domains are identical, but tasks are different due to a label shift.
We investigate the relationship between model size and TL, comparing a low-capacity VGG with a higher-capacity SEResNeXt.
arXiv Detail & Related papers (2022-10-18T09:15:27Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Radiomics-Guided Global-Local Transformer for Weakly Supervised
Pathology Localization in Chest X-Rays [65.88435151891369]
Radiomics-Guided Transformer (RGT) fuses textitglobal image information with textitlocal knowledge-guided radiomics information.
RGT consists of an image Transformer branch, a radiomics Transformer branch, and fusion layers that aggregate image and radiomic information.
arXiv Detail & Related papers (2022-07-10T06:32:56Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.