Evaluation of GPT-4 for chest X-ray impression generation: A reader
study on performance and perception
- URL: http://arxiv.org/abs/2311.06815v1
- Date: Sun, 12 Nov 2023 11:40:57 GMT
- Title: Evaluation of GPT-4 for chest X-ray impression generation: A reader
study on performance and perception
- Authors: Sebastian Ziegelmayer, Alexander W. Marka, Nicolas Lenhart, Nadja
Nehls, Stefan Reischl, Felix Harder, Andreas Sauter, Marcus Makowski, Markus
Graf, and Joshua Gawlitza
- Abstract summary: GPT-4 was used to generate chest X-ray impressions based on different input modalities (image, text, text and image)
Our study revealed significant discrepancies between a radiological assessment and common automatic evaluation metrics depending on the model input.
The detection of AI-generated findings is subject to bias that highly rated impressions are perceived as human-written.
- Score: 32.73124984242397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The remarkable generative capabilities of multimodal foundation models are
currently being explored for a variety of applications. Generating radiological
impressions is a challenging task that could significantly reduce the workload
of radiologists. In our study we explored and analyzed the generative abilities
of GPT-4 for Chest X-ray impression generation. To generate and evaluate
impressions of chest X-rays based on different input modalities (image, text,
text and image), a blinded radiological report was written for 25-cases of the
publicly available NIH-dataset. GPT-4 was given image, finding section or both
sequentially to generate an input dependent impression. In a blind randomized
reading, 4-radiologists rated the impressions and were asked to classify the
impression origin (Human, AI), providing justification for their decision.
Lastly text model evaluation metrics and their correlation with the
radiological score (summation of the 4 dimensions) was assessed. According to
the radiological score, the human-written impression was rated highest,
although not significantly different to text-based impressions. The automated
evaluation metrics showed moderate to substantial correlations to the
radiological score for the image impressions, however individual scores were
highly divergent among inputs, indicating insufficient representation of
radiological quality. Detection of AI-generated impressions varied by input and
was 61% for text-based impressions. Impressions classified as AI-generated had
significantly worse radiological scores even when written by a radiologist,
indicating potential bias. Our study revealed significant discrepancies between
a radiological assessment and common automatic evaluation metrics depending on
the model input. The detection of AI-generated findings is subject to bias that
highly rated impressions are perceived as human-written.
Related papers
- The current status of large language models in summarizing radiology report impressions [13.402769727597812]
The effectiveness of large language models (LLMs) in summarizing radiology report impressions remains unclear.
Three types of radiology reports, i.e., CT, PET-CT, and Ultrasound reports, are collected from Peking University Cancer Hospital and Institute.
We use the report findings to construct the zero-shot, one-shot, and three-shot prompts with complete example reports to generate the impressions.
arXiv Detail & Related papers (2024-06-04T09:23:30Z) - Consensus, dissensus and synergy between clinicians and specialist
foundation models in radiology report generation [32.26270073540666]
The worldwide shortage of radiologists restricts access to expert care and imposes heavy workloads.
Recent progress in automated report generation with vision-language models offer clear potential in ameliorating the situation.
We build a state-of-the-art report generation system for chest radiographs, $textitFlamingo-CXR, by fine-tuning a well-known vision-language foundation model on radiology data.
arXiv Detail & Related papers (2023-11-30T05:38:34Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Act Like a Radiologist: Radiology Report Generation across Anatomical Regions [50.13206214694885]
X-RGen is a radiologist-minded report generation framework across six anatomical regions.
In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases.
We enhance the recognition capacity of the image encoder by analysing images and reports across various regions.
arXiv Detail & Related papers (2023-05-26T07:12:35Z) - Improving Radiology Summarization with Radiograph and Anatomy Prompts [60.30659124918211]
We propose a novel anatomy-enhanced multimodal model to promote impression generation.
In detail, we first construct a set of rules to extract anatomies and put these prompts into each sentence to highlight anatomy characteristics.
We utilize a contrastive learning module to align these two representations at the overall level and use a co-attention to fuse them at the sentence level.
arXiv Detail & Related papers (2022-10-15T14:05:03Z) - Using Multi-modal Data for Improving Generalizability and Explainability
of Disease Classification in Radiology [0.0]
Traditional datasets for the radiological diagnosis tend to only provide the radiology image alongside the radiology report.
This paper utilizes the recently published Eye-Gaze dataset to perform an exhaustive study on the impact on performance and explainability of deep learning (DL) classification.
We find that the best classification performance of X-ray images is achieved with a combination of radiology report free-text and radiology image, with the eye-gaze data providing no performance boost.
arXiv Detail & Related papers (2022-07-29T16:49:05Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - Automated Radiological Report Generation For Chest X-Rays With
Weakly-Supervised End-to-End Deep Learning [17.315387269810426]
We built a database containing more than 12,000 CXR scans and radiological reports.
We developed a model based on deep convolutional neural network and recurrent network with attention mechanism.
The model provides automated recognition of given scans and generation of reports.
arXiv Detail & Related papers (2020-06-18T08:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.