Related papers: Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information

Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information

URL: http://arxiv.org/abs/2410.07111v1
Date: Fri, 20 Sep 2024 01:42:53 GMT
Title: Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information
Authors: Choonghan Kim, Seonhee Cho, Joo Heung Yoon,
Abstract summary: Large language models (LLMs) are gaining use in clinical settings, but their performance can suffer from incomplete radiology reports. We tested whether multimodal LLMs (using text and images) could improve accuracy and understanding in chest radiography reports.
Score: 0.8602553195689513
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Background: Large language models (LLMs) are gaining use in clinical settings, but their performance can suffer with incomplete radiology reports. We tested whether multimodal LLMs (using text and images) could improve accuracy and understanding in chest radiography reports, making them more effective for clinical decision support. Purpose: To assess the robustness of LLMs in generating accurate impressions from chest radiography reports using both incomplete data and multimodal data. Material and Methods: We used 300 radiology image-report pairs from the MIMIC-CXR database. Three LLMs (OpenFlamingo, MedFlamingo, IDEFICS) were tested in both text-only and multimodal formats. Impressions were first generated from the full text, then tested by removing 20%, 50%, and 80% of the text. The impact of adding images was evaluated using chest x-rays, and model performance was compared using three metrics with statistical analysis. Results: The text-only models (OpenFlamingo, MedFlamingo, IDEFICS) had similar performance (ROUGE-L: 0.39 vs. 0.21 vs. 0.21; F1RadGraph: 0.34 vs. 0.17 vs. 0.17; F1CheXbert: 0.53 vs. 0.40 vs. 0.40), with OpenFlamingo performing best on complete text (p<0.001). Performance declined with incomplete data across all models. However, adding images significantly boosted the performance of MedFlamingo and IDEFICS (p<0.001), equaling or surpassing OpenFlamingo, even with incomplete text. Conclusion: LLMs may produce low-quality outputs with incomplete radiology data, but multimodal LLMs can improve reliability and support clinical decision-making. Keywords: Large language model; multimodal; semantic analysis; Chest Radiography; Clinical Decision Support;

Related papers

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs [4.273291010923853]
Large language models (LLMs) have demonstrated remarkable performance across various domains.<n>We propose Optimal Transport-Driven Radiology Report Generation (OTDRG) to align image features with disease labels extracted from reports.<n>OTDRG achieves state-of-the-art performance in both natural language generation (NLG) and clinical efficacy (CE) metrics.
arXiv Detail & Related papers (2025-07-05T05:48:48Z)
Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems [1.1373722549440357]
We compare a rule-based algorithm (RBA), RadBERT, and three lightweight open-weight LLMs for multi-disease labeling of chest, abdomen, and pelvis CT reports.<n>Performance was evaluated using Cohen's Kappa and micro/macro-averaged F1 scores.
arXiv Detail & Related papers (2025-06-03T18:00:08Z)
RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment [10.67889367763112]
RadAlign is a novel framework that combines the predictive accuracy of vision-language models with the reasoning capabilities of large language models. Our framework maintains strong clinical interpretability while reducing hallucinations, advancing automated medical imaging and report analysis through integrated predictive and generative AI.
arXiv Detail & Related papers (2025-01-13T17:55:32Z)
UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities [68.12889379702824]
Vision-Language Models (VLMs) trained via contrastive learning have achieved notable success in natural image tasks. UniMed is a large-scale, open-source multi-modal medical dataset comprising over 5.3 million image-text pairs. We trained UniMed-CLIP, a unified VLM for six modalities, achieving notable gains in zero-shot evaluations.
arXiv Detail & Related papers (2024-12-13T18:59:40Z)
LLM Robustness Against Misinformation in Biomedical Question Answering [50.98256373698759]
The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering. We evaluate the effectiveness and robustness of four LLMs against misinformation in answering biomedical questions.
arXiv Detail & Related papers (2024-10-27T16:23:26Z)
R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation [7.4871243017824165]
This paper proposes a novel context-guided efficient X-ray medical report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model.
arXiv Detail & Related papers (2024-08-19T07:15:11Z)
The current status of large language models in summarizing radiology report impressions [13.402769727597812]
The effectiveness of large language models (LLMs) in summarizing radiology report impressions remains unclear. Three types of radiology reports, i.e., CT, PET-CT, and Ultrasound reports, are collected from Peking University Cancer Hospital and Institute. We use the report findings to construct the zero-shot, one-shot, and three-shot prompts with complete example reports to generate the impressions.
arXiv Detail & Related papers (2024-06-04T09:23:30Z)
CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning. Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z)
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis [1.2903829793534272]
Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions. Efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records. This paper introduces MedPromptX, the first model to integrate multimodal large language models (MLLMs), few-shot prompting (FP) and visual grounding (VG) Results demonstrate the SOTA performance of MedPromptX, achieving an 11% improvement in F1-score compared to the baselines.
arXiv Detail & Related papers (2024-03-22T19:19:51Z)
RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance [53.20640629352422]
Conversational AI tools can generate and discuss clinically correct radiology reports for a given medical image. RaDialog is the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. Our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions.
arXiv Detail & Related papers (2023-11-30T16:28:40Z)
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images [3.0757789554622597]
This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs) For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists.
arXiv Detail & Related papers (2023-10-22T06:22:37Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
CLARA: Clinical Report Auto-completion [56.206459591367405]
CLinicit Al it Report it Auto-completion (CLARA) is an interactive method that generates reports in a sentence by sentence fashion based on doctors' anchor words and partially completed sentences. In our experimental evaluation, CLARA achieved 0.393 CIDEr and 0.248 BLEU-4 on X-ray reports and 0.482 CIDEr and 0.491 BLEU-4 for EEG reports for sentence-level generation.
arXiv Detail & Related papers (2020-02-26T18:45:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.