MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
- URL: http://arxiv.org/abs/2403.15585v3
- Date: Fri, 29 Mar 2024 00:44:18 GMT
- Title: MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
- Authors: Mai A. Shaaban, Adnan Khan, Mohammad Yaqub,
- Abstract summary: Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions.
Efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records.
This paper introduces MedPromptX, the first model to integrate multimodal large language models (MLLMs), few-shot prompting (FP) and visual grounding (VG)
Results demonstrate the SOTA performance of MedPromptX, achieving an 11% improvement in F1-score compared to the baselines.
- Score: 1.2903829793534272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions, but efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records (EHR). This paper introduces MedPromptX, the first model to integrate multimodal large language models (MLLMs), few-shot prompting (FP) and visual grounding (VG) to combine imagery with EHR data for chest X-ray diagnosis. A pre-trained MLLM is utilized to complement the missing EHR information, providing a comprehensive understanding of patients' medical history. Additionally, FP reduces the necessity for extensive training of MLLMs while effectively tackling the issue of hallucination. Nevertheless, the process of determining the optimal number of few-shot examples and selecting high-quality candidates can be burdensome, yet it profoundly influences model performance. Hence, we propose a new technique that dynamically refines few-shot data for real-time adjustment to new patient scenarios. Moreover, VG aids in focusing the model's attention on relevant regions of interest in X-ray images, enhancing the identification of abnormalities. We release MedPromptX-VQA, a new in-context visual question answering dataset encompassing interleaved image and EHR data derived from MIMIC-IV and MIMIC-CXR databases. Results demonstrate the SOTA performance of MedPromptX, achieving an 11% improvement in F1-score compared to the baselines. Code and data are available at https://github.com/BioMedIA-MBZUAI/MedPromptX
Related papers
- R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation [7.4871243017824165]
This paper proposes a novel context-guided efficient X-ray medical report generation framework.
Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model.
arXiv Detail & Related papers (2024-08-19T07:15:11Z) - D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions [8.50767187405446]
We propose D-Rax -- a domain-specific, conversational, radiologic assistance tool.
We enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting.
We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations.
arXiv Detail & Related papers (2024-07-02T18:43:10Z) - MLVICX: Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning [6.4136876268620115]
MLVICX is an approach to capture rich representations in the form of embeddings from chest X-ray images.
We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning.
arXiv Detail & Related papers (2024-03-18T06:19:37Z) - DDPM based X-ray Image Synthesizer [0.0]
We propose a Denoising Diffusion Probabilistic Model (DDPM) combined with a UNet architecture for X-ray image synthesis.
Our methodology employs over 3000 pneumonia X-ray images obtained from Kaggle for training.
Results demonstrate the effectiveness of our approach, as the model successfully generated realistic images with low Mean Squared Error (MSE)
arXiv Detail & Related papers (2024-01-03T04:35:58Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.