CXR-LLAVA: a multimodal large language model for interpreting chest
X-ray images
- URL: http://arxiv.org/abs/2310.18341v3
- Date: Sun, 14 Jan 2024 13:29:15 GMT
- Title: CXR-LLAVA: a multimodal large language model for interpreting chest
X-ray images
- Authors: Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon
- Abstract summary: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs)
For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities.
The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists.
- Score: 3.0757789554622597
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Purpose: This study aimed to develop an open-source multimodal large language
model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent
advances in large language models (LLMs) to potentially replicate the image
interpretation skills of human radiologists Materials and Methods: For
training, we collected 592,580 publicly available CXRs, of which 374,881 had
labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided
free-text radiology reports (Dataset 2). After pre-training a vision
transformer with Dataset 1, we integrated it with an LLM influenced by the
LLAVA network. Then, the model was fine-tuned, primarily using Dataset 2. The
model's diagnostic performance for major pathological findings was evaluated,
along with the acceptability of radiologic reports by human radiologists, to
gauge its potential for autonomous reporting. Results: The model demonstrated
impressive performance in test sets, achieving an average F1 score of 0.81 for
six major pathological findings in the MIMIC internal test set and 0.62 for
seven major pathological findings in the external test set. The model's F1
scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets.
In human radiologist evaluations of the external test set, the model achieved a
72.7% success rate in autonomous reporting, slightly below the 84.0% rate of
ground truth reports. Conclusion: This study highlights the significant
potential of multimodal LLMs for CXR interpretation, while also acknowledging
the performance limitations. Despite these challenges, we believe that making
our model open-source will catalyze further research, expanding its
effectiveness and applicability in various clinical contexts. CXR-LLAVA is
available at https://github.com/ECOFRI/CXR_LLAVA.
Related papers
- Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - MAIRA-1: A specialised large multimodal model for radiology report generation [41.69727330319648]
We present a radiology-specific multimodal model for generating radiological reports from chest X-rays (CXRs)
Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders.
Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality.
arXiv Detail & Related papers (2023-11-22T19:45:40Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning.
Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z) - Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation [7.586632627817609]
Radiologists face high burnout rates, partly due to the increasing volume of Chest X-rays (CXRs) requiring interpretation and reporting.
Our proposed CXR report generator integrates elements of the workflow and introduces a novel reward for reinforcement learning.
Results from our study demonstrate that the proposed model generates reports that are more aligned with radiologists' reports than state-of-the-art models.
arXiv Detail & Related papers (2023-07-19T05:41:14Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Open-radiomics: A Collection of Standardized Datasets and a Technical
Protocol for Reproducible Radiomics Machine Learning Pipelines [0.0]
We introduce open-radiomics, a set of radiomics datasets and a comprehensive radiomics pipeline.
Experiments are conducted on BraTS 2020 open-source Magnetic Resonance Imaging (MRI) dataset.
Unlike binWidth and image normalization, tumor subregion and imaging sequence significantly affected performance of the models.
arXiv Detail & Related papers (2022-07-29T16:37:46Z) - Event-based clinical findings extraction from radiology reports with
pre-trained language model [0.22940141855172028]
We present a new corpus of radiology reports annotated with clinical findings.
The gold standard corpus contained a total of 500 annotated computed tomography (CT) reports.
We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT.
arXiv Detail & Related papers (2021-12-27T05:03:10Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z) - Exploration of Interpretability Techniques for Deep COVID-19
Classification using Chest X-ray Images [10.01138352319106]
Five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their Ensemble have been used in this paper to classify COVID-19, pneumoniae and healthy subjects using Chest X-Ray images.
The mean Micro-F1 score of the models for COVID-19 classifications ranges from 0.66 to 0.875, and is 0.89 for the Ensemble of the network models.
arXiv Detail & Related papers (2020-06-03T22:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.