Are Natural Domain Foundation Models Useful for Medical Image
Classification?
- URL: http://arxiv.org/abs/2310.19522v2
- Date: Tue, 14 Nov 2023 12:21:41 GMT
- Title: Are Natural Domain Foundation Models Useful for Medical Image
Classification?
- Authors: Joana Pal\'es Huix and Adithya Raju Ganeshan and Johan Fredin Haslum
and Magnus S\"oderberg and Christos Matsoukas and Kevin Smith
- Abstract summary: We evaluate the performance of five foundation models across four well-established medical imaging datasets.
DINOv2 consistently outperforms the standard practice of ImageNet pretraining.
Other foundation models failed to consistently beat this established baseline indicating limitations in their transferability to medical image classification tasks.
- Score: 2.7652948339147807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The deep learning field is converging towards the use of general foundation
models that can be easily adapted for diverse tasks. While this paradigm shift
has become common practice within the field of natural language processing,
progress has been slower in computer vision. In this paper we attempt to
address this issue by investigating the transferability of various
state-of-the-art foundation models to medical image classification tasks.
Specifically, we evaluate the performance of five foundation models, namely
SAM, SEEM, DINOv2, BLIP, and OpenCLIP across four well-established medical
imaging datasets. We explore different training settings to fully harness the
potential of these models. Our study shows mixed results. DINOv2 consistently
outperforms the standard practice of ImageNet pretraining. However, other
foundation models failed to consistently beat this established baseline
indicating limitations in their transferability to medical image classification
tasks.
Related papers
- Rethinking Foundation Models for Medical Image Classification through a Benchmark Study on MedMNIST [7.017817009055001]
We study the capabilities of foundation models in medical image classification tasks by conducting a benchmark study on the MedMNIST dataset.
We adopt various foundation models ranging from convolutional to Transformer-based models and implement both end-to-end training and linear probing for all classification tasks.
arXiv Detail & Related papers (2025-01-24T18:01:07Z) - Few-shot Adaptation of Medical Vision-Language Models [17.11090825001394]
We introduce the first structured benchmark for adapting medical vision-language models (VLMs) in a strict few-shot regime.
We evaluate a simple generalization of the linear-probe adaptation baseline, which seeks an optimal blending of the visual prototypes and text embeddings.
Surprisingly, such a text-informed linear probe yields competitive performances in comparison to convoluted prompt-learning and adapter-based strategies.
arXiv Detail & Related papers (2024-09-05T19:10:29Z) - Disease Classification and Impact of Pretrained Deep Convolution Neural Networks on Diverse Medical Imaging Datasets across Imaging Modalities [0.0]
This paper investigates the intricacies of using pretrained deep convolutional neural networks with transfer learning across diverse medical imaging datasets.
It shows that the use of pretrained models as fixed feature extractors yields poor performance irrespective of the datasets.
It is also found that deeper and more complex architectures did not necessarily result in the best performance.
arXiv Detail & Related papers (2024-08-30T04:51:19Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Empirical Analysis of a Segmentation Foundation Model in Prostate
Imaging [9.99042549094606]
We consider a recently developed foundation model for medical image segmentation, UniverSeg.
We conduct an empirical evaluation study in the context of prostate imaging and compare it against the conventional approach of training a task-specific segmentation model.
arXiv Detail & Related papers (2023-07-06T20:00:52Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - MedFMC: A Real-world Dataset and Benchmark For Foundation Model
Adaptation in Medical Image Classification [41.16626194300303]
Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications.
Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples.
Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks.
arXiv Detail & Related papers (2023-06-16T01:46:07Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Understanding the Tricks of Deep Learning in Medical Image Segmentation:
Challenges and Future Directions [66.40971096248946]
In this paper, we collect a series of MedISeg tricks for different model implementation phases.
We experimentally explore the effectiveness of these tricks on consistent baselines.
We also open-sourced a strong MedISeg repository, where each component has the advantage of plug-and-play.
arXiv Detail & Related papers (2022-09-21T12:30:05Z) - Domain Generalization on Medical Imaging Classification using Episodic
Training with Task Augmentation [62.49837463676111]
We propose a novel scheme of episodic training with task augmentation on medical imaging classification.
Motivated by the limited number of source domains in real-world medical deployment, we consider the unique task-level overfitting.
arXiv Detail & Related papers (2021-06-13T03:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.