Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis
- URL: http://arxiv.org/abs/2504.16047v1
- Date: Tue, 22 Apr 2025 17:20:34 GMT
- Title: Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis
- Authors: Frank Li, Hari Trivedi, Bardia Khosravi, Theo Dapamede, Mohammadreza Chavoshi, Abdulhameed Dere, Rohan Satya Isaac, Aawez Mansuri, Janice Newsome, Saptarshi Purkayastha, Judy Gichoya,
- Abstract summary: This study evaluates three vision-language foundation models (RAD-DINO, CheXagent, and BiomedCLIP) on their ability to capture fine-grained imaging features for radiology tasks.<n>The models were assessed across classification, segmentation, and regression tasks for pneumothorax and cardiomegaly on chest radiographs.
- Score: 4.803310914375717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models, trained on vast amounts of data using self-supervised techniques, have emerged as a promising frontier for advancing artificial intelligence (AI) applications in medicine. This study evaluates three different vision-language foundation models (RAD-DINO, CheXagent, and BiomedCLIP) on their ability to capture fine-grained imaging features for radiology tasks. The models were assessed across classification, segmentation, and regression tasks for pneumothorax and cardiomegaly on chest radiographs. Self-supervised RAD-DINO consistently excelled in segmentation tasks, while text-supervised CheXagent demonstrated superior classification performance. BiomedCLIP showed inconsistent performance across tasks. A custom segmentation model that integrates global and local features substantially improved performance for all foundation models, particularly for challenging pneumothorax segmentation. The findings highlight that pre-training methodology significantly influences model performance on specific downstream tasks. For fine-grained segmentation tasks, models trained without text supervision performed better, while text-supervised models offered advantages in classification and interpretability. These insights provide guidance for selecting foundation models based on specific clinical applications in radiology.
Related papers
- PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors.<n>Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images.<n>Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z) - AI-Assisted Colonoscopy: Polyp Detection and Segmentation using Foundation Models [0.10037949839020764]
In colonoscopy, 80% of the missed polyps could be detected with the help of Deep Learning models.<n>In the search for algorithms capable of addressing this challenge, foundation models emerge as promising candidates.<n>Their zero-shot or few-shot learning capabilities, facilitate generalization to new data or tasks without extensive fine-tuning.<n>A comprehensive evaluation of foundation models for polyp segmentation was conducted, assessing both detection and delimitation.
arXiv Detail & Related papers (2025-03-31T14:20:53Z) - Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation [30.524999223901645]
We propose an enhanced Segment Anything Model (SAM) framework that utilizes annotation-efficient prompts generated in a fully unsupervised fashion.<n>We adopt the direct preference optimization technique to design an optimal policy that enables the model to generate high-fidelity segmentations.<n>State-of-the-art performance of our framework in tasks such as lung segmentation, breast tumor segmentation, and organ segmentation across various modalities, including X-ray, ultrasound, and abdominal CT, justifies its effectiveness in low-annotation data scenarios.
arXiv Detail & Related papers (2025-03-06T17:28:48Z) - Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective [32.93871326428446]
Recent advances in artificial intelligence (AI) are revolutionizing medical imaging and computational pathology.<n>A constant challenge in the analysis of digital Whole Slide Images (WSIs) is the problem of aggregating tens of thousands of tile-level image embeddings to a slide-level representation.<n>This study conducts a benchmarking analysis of ten slide-level aggregation techniques across nine clinically relevant tasks.
arXiv Detail & Related papers (2024-07-10T17:00:57Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation [0.8897689150430447]
We conduct the first systematic benchmark study for variants of 3D U-shaped models.
Our study examines the impact of different attention mechanisms, the number of resolution stages, and network configurations on segmentation accuracy and computational complexity.
arXiv Detail & Related papers (2024-02-05T17:43:02Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Training custom modality-specific U-Net models with weak localizations
for improved Tuberculosis segmentation and localization [0.6999740786886535]
UNet segmentation models have demonstrated superior performance compared to conventional handcrafted features.
We train custom chest X ray modality specific UNet models for semantic segmentation of Tuberculosis consistent findings.
arXiv Detail & Related papers (2021-02-21T14:03:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.