Towards General Purpose Vision Foundation Models for Medical Image
Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks
- URL: http://arxiv.org/abs/2312.02366v3
- Date: Thu, 28 Dec 2023 18:36:50 GMT
- Title: Towards General Purpose Vision Foundation Models for Medical Image
Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks
- Authors: Mohammed Baharoon, Waseem Qureshi, Jiahong Ouyang, Yanwu Xu,
Abdulrhman Aljouie, Wei Peng
- Abstract summary: DINOv2 is an open-source foundation model pre-trained with self-supervised learning on 142 million curated natural images.
This study comprehensively evaluates DINOv2 for radiology, conducting over 100 experiments across diverse modalities.
- Score: 6.2454947749350165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of deep learning systems into healthcare has been hindered by
the resource-intensive process of data annotation and the inability of these
systems to generalize to different data distributions. Foundation models, which
are models pre-trained on large datasets, have emerged as a solution to reduce
reliance on annotated data and enhance model generalizability and robustness.
DINOv2 is an open-source foundation model pre-trained with self-supervised
learning on 142 million curated natural images that exhibits promising
capabilities across various vision tasks. Nevertheless, a critical question
remains unanswered regarding DINOv2's adaptability to radiological imaging, and
whether its features are sufficiently general to benefit radiology image
analysis. Therefore, this study comprehensively evaluates DINOv2 for radiology,
conducting over 100 experiments across diverse modalities (X-ray, CT, and MRI).
To measure the effectiveness and generalizability of DINOv2's feature
representations, we analyze the model across medical image analysis tasks
including disease classification and organ segmentation on both 2D and 3D
images, and under different settings like kNN, few-shot learning,
linear-probing, end-to-end fine-tuning, and parameter-efficient fine-tuning.
Comparative analyses with established supervised, self-supervised, and
weakly-supervised models reveal DINOv2's superior performance and cross-task
generalizability. The findings contribute insights to potential avenues for
optimizing pre-training strategies for medical imaging and enhancing the
broader understanding of DINOv2's role in bridging the gap between natural and
radiological image analysis. Our code is available at
https://github.com/MohammedSB/DINOv2ForRadiology
Related papers
- Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method [0.0]
This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS)
Previous research has lacked the application of object detection models to EBUS-TBNA.
arXiv Detail & Related papers (2024-04-02T13:23:21Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Comparative Analysis of ImageNet Pre-Trained Deep Learning Models and
DINOv2 in Medical Imaging Classification [7.205610366609243]
In this paper, we performed a glioma grading task using three clinical modalities of brain MRI data.
We compared the performance of various pre-trained deep learning models, including those based on ImageNet and DINOv2.
Our findings indicate that in our clinical dataset, DINOv2's performance was not as strong as ImageNet-based pre-trained models.
arXiv Detail & Related papers (2024-02-12T11:49:08Z) - Deep Residual CNN for Multi-Class Chest Infection Diagnosis [1.8204773850586642]
This research delves into the development and evaluation of a Deep Residual Convolutional Neural Network (CNN) for the multi-class diagnosis of chest infections.
The implemented model, trained and validated on a dataset amalgamated from diverse sources, demonstrated a robust overall accuracy of 93%.
arXiv Detail & Related papers (2023-11-17T10:05:10Z) - UniBrain: Universal Brain MRI Diagnosis with Hierarchical
Knowledge-enhanced Pre-training [66.16134293168535]
We propose a hierarchical knowledge-enhanced pre-training framework for the universal brain MRI diagnosis, termed as UniBrain.
Specifically, UniBrain leverages a large-scale dataset of 24,770 imaging-report pairs from routine diagnostics.
arXiv Detail & Related papers (2023-09-13T09:22:49Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - ROCT-Net: A new ensemble deep convolutional model with improved spatial
resolution learning for detecting common diseases from retinal OCT images [0.0]
This paper presents a new enhanced deep ensemble convolutional neural network for detecting retinal diseases from OCT images.
Our model generates rich and multi-resolution features by employing the learning architectures of two robust convolutional models.
Our experiments on two datasets and comparing our model with some other well-known deep convolutional neural networks have proven that our architecture can increase the classification accuracy up to 5%.
arXiv Detail & Related papers (2022-03-03T17:51:01Z) - InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal
Artifact Reduction in CT Images [53.4351366246531]
We construct a novel interpretable dual domain network, termed InDuDoNet+, into which CT imaging process is finely embedded.
We analyze the CT values among different tissues, and merge the prior observations into a prior network for our InDuDoNet+, which significantly improve its generalization performance.
arXiv Detail & Related papers (2021-12-23T15:52:37Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Generalization of Deep Convolutional Neural Networks -- A Case-study on
Open-source Chest Radiographs [2.934426478974089]
One major challenge is to conceive a DCNN model with remarkable performance on both internal and external data.
We demonstrate that DCNNs may not generalize to new data, but increasing the quality and heterogeneity of the training data helps to improve the generalizibility factor.
arXiv Detail & Related papers (2020-07-11T14:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.