Related papers: From Embeddings to Accuracy: Comparing Foundation Models for Radiographic Classification

From Embeddings to Accuracy: Comparing Foundation Models for Radiographic Classification

URL: http://arxiv.org/abs/2505.10823v2
Date: Wed, 03 Sep 2025 20:41:40 GMT
Title: From Embeddings to Accuracy: Comparing Foundation Models for Radiographic Classification
Authors: Xue Li, Jameson Merkow, Noel C. F. Codella, Alberto Santamaria-Pang, Naiteek Sangani, Alexander Ersoy, Christopher Burt, John W. Garrett, Richard J. Bruce, Joshua D. Warner, Tyler Bradshaw, Ivan Tarapov, Matthew P. Lungren, Alan B. McMillan,
Abstract summary: We evaluate embeddings from seven foundation models for training lightweight adapters in radiography classification.<n>The combination of MedImageInsight embeddings with an SVM and an adapter achieved the highest mean area under the curve (mAUC) of 93.1%.<n>These lightweight groups are computationally efficient, training in minutes and performing inference in seconds on a CPU, making them practical for clinical use.
Score: 33.96915720287914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models provide robust embeddings for diverse tasks, including medical imaging. We evaluate embeddings from seven general and medical-specific foundation models (e.g., DenseNet121, BiomedCLIP, MedImageInsight, Rad-DINO, CXR-Foundation) for training lightweight adapters in multi-class radiography classification. Using a dataset of 8,842 radiographs across seven classes, we trained adapters with algorithms like K-Nearest Neighbors, logistic regression, SVM, random forest, and MLP. The combination of MedImageInsight embeddings with an SVM or MLP adapter achieved the highest mean area under the curve (mAUC) of 93.1%. This performance was statistically superior to other models, including MedSigLIP with an MLP (91.0%), Rad-DINO with an SVM (90.7%), and CXR-Foundation with logistic regression (88.6%). In contrast, models like BiomedCLIP (82.8%) and Med-Flamingo (78.5%) showed lower performance. Crucially, these lightweight adapters are computationally efficient, training in minutes and performing inference in seconds on a CPU, making them practical for clinical use. A fairness analysis of the top-performing MedImageInsight adapter revealed minimal performance disparities across patient gender (within 1.8%) and age groups (std. dev < 1.4%), with no significant statistical differences. These findings confirm that embeddings from specialized foundation models, particularly MedImageInsight, can power accurate, efficient, and equitable diagnostic tools using simple, lightweight adapters.

Related papers

Taylor-Series Expanded Kolmogorov-Arnold Network for Medical Imaging Classification [0.0]
This study introduces Kolmogorov-Arnold Networks (KANs) for accurate medical image classification with limited, diverse datasets.<n>The models include SBTAYLOR-KAN, integrating B-splines with Taylor series; SBRBF-KAN, combining B-splines with Radial Basis Functions; and SBWAVELET-KAN, embedding B-splines in Morlet wavelet transforms.<n>The models were evaluated on brain MRI, chest X-rays, tuberculosis X-rays, and skin lesion images without preprocessing.
arXiv Detail & Related papers (2025-09-17T04:33:54Z)
Enhanced Multi-Class Classification of Gastrointestinal Endoscopic Images with Interpretable Deep Learning Model [0.7349657385817541]
This research introduces a novel approach to enhance classification accuracy using 8,000 labeled endoscopic images from the Kvasir dataset.<n>The proposed architecture eliminates reliance on data augmentation while preserving moderate model complexity.<n>The model achieves a test accuracy of 94.25%, alongside precision and recall of 94.29% and 94.24% respectively.
arXiv Detail & Related papers (2025-03-02T08:07:50Z)
Deep learning and classical computer vision techniques in medical image analysis: Case studies on brain MRI tissue segmentation, lung CT COPD registration, and skin lesion classification [0.0]
This study is the first to systematically evaluate segmentation, registration, and classification tasks across multiple imaging modalities.<n>For brain tissue segmentation, 3D DL models outperformed 2D and patch-based models, specifically nnU-Net achieving Dice of 0.9397.<n>In lung CT registration, classical Elastix-based methods outperformed DL models, achieving a minimum Target Registration Error (TRE) of 6.68 mm.<n>For skin lesion classification, ensembles of DL models like InceptionResNetV2 and ResNet50 excelled, achieving up to 90.44%, and 93.62% accuracies for binary and multi
arXiv Detail & Related papers (2025-02-26T16:05:08Z)
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention [1.2277343096128712]
We propose to leverage advanced segmentation capabilities of Segment Anything Model 2 (SAM2) as a visual prompting cue to help visual encoder in the CLIP (Contrastive Language-Image Pretraining)<n>This helps the model to focus on highly discriminative regions, without getting distracted from visually similar background features.<n>We evaluate our method on diverse medical datasets including X-rays, CT scans, and MRI images, and report an accuracy of (71%, 81%, 86%, 58%) from the proposed approach.
arXiv Detail & Related papers (2025-01-07T14:49:12Z)
Embeddings are all you need! Achieving High Performance Medical Image Classification through Training-Free Embedding Analysis [0.0]
Developing artificial intelligence (AI) and machine learning (ML) models for medical imaging typically involves extensive training and testing on large datasets.<n>We investigated the feasibility of replacing conventional training procedures with an embedding-based approach.
arXiv Detail & Related papers (2024-12-12T16:59:37Z)
Brain Tumor Classification on MRI in Light of Molecular Markers [56.99710477905796]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.<n>This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z)
InfLocNet: Enhanced Lung Infection Localization and Disease Detection from Chest X-Ray Images Using Lightweight Deep Learning [0.5242869847419834]
This paper presents a novel, lightweight deep learning based segmentation-classification network. It is designed to enhance the detection and localization of lung infections using chest X-ray images. Our model achieves remarkable results with an Intersection over Union (IoU) of 93.59% and a Dice Similarity Coefficient (DSC) of 97.61% in lung area segmentation.
arXiv Detail & Related papers (2024-08-12T19:19:23Z)
MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks [6.8948885302235325]
We introduce visual prompting (VP) to fill in the gap between input medical images and natural pretrained vision model. We design a joint learning loss function containing categorisation loss and discrepancy loss, which describe the variance of prompted and plain images. On out of distribution medical dataset, our method(90.33%) can outperform FF (85.15%) with absolute 5.18 % lead.
arXiv Detail & Related papers (2024-05-13T01:18:25Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z)
PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction [4.659998272408215]
Early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. We suggest a multimodal fusion methodology, termed PE-MVCNet, which capitalizes on Computed Tomography Pulmonary Angiography imaging and EMR data. Our proposed model outperforms existing methodologies, corroborating that our multimodal fusion model excels compared to models that use a single data modality.
arXiv Detail & Related papers (2024-02-27T03:53:27Z)
Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models [71.18275399694689]
The effectiveness of PEFT on medical vision foundation models is still unclear. We set up new state-of-the-art on a range of data-efficient learning tasks, such as an AUROC score of 80.6% using 1% labeled data on NIH ChestX-ray14. We hope this study can evoke more attention from the community in the use of PEFT for transfer learning on medical imaging tasks.
arXiv Detail & Related papers (2024-01-22T18:59:07Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency. ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData. ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z)
Improving Disease Classification Performance and Explainability of Deep Learning Models in Radiology with Heatmap Generators [0.0]
Three experiment sets were conducted with a U-Net architecture to improve the classification performance. The greatest improvements were for the "pneumonia" and "CHF" classes, which the baseline model struggled most to classify.
arXiv Detail & Related papers (2022-06-28T13:03:50Z)
GSDA: Generative Adversarial Network-based Semi-Supervised Data Augmentation for Ultrasound Image Classification [8.554511144730387]
Medical Ultrasound (US) is one of the most widely used imaging modalities in clinical practice. Deep Learning (DL) models can serve as advanced medical US image analysis tools, but their performance is greatly limited by the scarcity of large datasets. We develop Generative Adrial Network (GAN)-based semi-supervised data augmentation method.
arXiv Detail & Related papers (2022-03-11T16:52:14Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
Y-Net for Chest X-Ray Preprocessing: Simultaneous Classification of Geometry and Segmentation of Annotations [70.0118756144807]
This work introduces a general pre-processing step for chest x-ray input into machine learning algorithms. A modified Y-Net architecture based on the VGG11 encoder is used to simultaneously learn geometric orientation and segmentation of radiographs. Results were evaluated by expert clinicians, with acceptable geometry in 95.8% and annotation mask in 96.2%, compared to 27.0% and 34.9% respectively in control images.
arXiv Detail & Related papers (2020-05-08T02:16:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.