Related papers: Towards General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks

Towards General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks

URL: http://arxiv.org/abs/2312.02366v3
Date: Thu, 28 Dec 2023 18:36:50 GMT
Title: Towards General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks
Authors: Mohammed Baharoon, Waseem Qureshi, Jiahong Ouyang, Yanwu Xu, Abdulrhman Aljouie, Wei Peng
Abstract summary: DINOv2 is an open-source foundation model pre-trained with self-supervised learning on 142 million curated natural images. This study comprehensively evaluates DINOv2 for radiology, conducting over 100 experiments across diverse modalities.
Score: 6.2454947749350165
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of deep learning systems into healthcare has been hindered by the resource-intensive process of data annotation and the inability of these systems to generalize to different data distributions. Foundation models, which are models pre-trained on large datasets, have emerged as a solution to reduce reliance on annotated data and enhance model generalizability and robustness. DINOv2 is an open-source foundation model pre-trained with self-supervised learning on 142 million curated natural images that exhibits promising capabilities across various vision tasks. Nevertheless, a critical question remains unanswered regarding DINOv2's adaptability to radiological imaging, and whether its features are sufficiently general to benefit radiology image analysis. Therefore, this study comprehensively evaluates DINOv2 for radiology, conducting over 100 experiments across diverse modalities (X-ray, CT, and MRI). To measure the effectiveness and generalizability of DINOv2's feature representations, we analyze the model across medical image analysis tasks including disease classification and organ segmentation on both 2D and 3D images, and under different settings like kNN, few-shot learning, linear-probing, end-to-end fine-tuning, and parameter-efficient fine-tuning. Comparative analyses with established supervised, self-supervised, and weakly-supervised models reveal DINOv2's superior performance and cross-task generalizability. The findings contribute insights to potential avenues for optimizing pre-training strategies for medical imaging and enhancing the broader understanding of DINOv2's role in bridging the gap between natural and radiological image analysis. Our code is available at https://github.com/MohammedSB/DINOv2ForRadiology

Related papers

Quality Assessment and Distortion-aware Saliency Prediction for AI-Generated Omnidirectional Images [70.49595920462579]
This work studies the quality assessment and distortion-aware saliency prediction problems for AIGODIs.<n>We propose two models with shared encoders based on the BLIP-2 model to evaluate the human visual experience and predict distortion-aware saliency for AI-generated omnidirectional images.
arXiv Detail & Related papers (2025-06-27T05:36:04Z)
Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography [0.0]
This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography. It focuses on COVID-19, lung opacity, and viral pneumonia. The results aim to inform the integration of AI-driven diagnostic tools in clinical practice.
arXiv Detail & Related papers (2025-04-16T16:54:37Z)
Abnormality-Driven Representation Learning for Radiology Imaging [0.8321462983924758]
We introduce lesion-enhanced contrastive learning (LeCL), a novel approach to obtain visual representations driven by abnormalities in 2D axial slices across different locations of the CT scans. We evaluate our approach across three clinical tasks: tumor lesion location, lung disease detection, and patient staging, benchmarking against four state-of-the-art foundation models.
arXiv Detail & Related papers (2024-11-25T13:53:26Z)
Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images [1.2499537119440245]
DINOv2 is a self-supervised learning vision transformer trained on natural images for LA segmentation using MRI. We demonstrate its ability to provide accurate & consistent segmentation, achieving a mean Dice score of.871 & a Jaccard Index of.792 for end-to-end fine-tuning. These results suggest that DINOv2 effectively adapts to MRI with limited data, highlighting its potential as a competitive tool for segmentation & encouraging broader use in medical imaging.
arXiv Detail & Related papers (2024-11-14T17:15:51Z)
Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning [49.197385954021456]
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for visualization and subsequent analysis tasks. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated. Most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios.
arXiv Detail & Related papers (2024-06-10T02:20:26Z)
Comparative Analysis of ImageNet Pre-Trained Deep Learning Models and DINOv2 in Medical Imaging Classification [7.205610366609243]
In this paper, we performed a glioma grading task using three clinical modalities of brain MRI data. We compared the performance of various pre-trained deep learning models, including those based on ImageNet and DINOv2. Our findings indicate that in our clinical dataset, DINOv2's performance was not as strong as ImageNet-based pre-trained models.
arXiv Detail & Related papers (2024-02-12T11:49:08Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
ROCT-Net: A new ensemble deep convolutional model with improved spatial resolution learning for detecting common diseases from retinal OCT images [0.0]
This paper presents a new enhanced deep ensemble convolutional neural network for detecting retinal diseases from OCT images. Our model generates rich and multi-resolution features by employing the learning architectures of two robust convolutional models. Our experiments on two datasets and comparing our model with some other well-known deep convolutional neural networks have proven that our architecture can increase the classification accuracy up to 5%.
arXiv Detail & Related papers (2022-03-03T17:51:01Z)
InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images [53.4351366246531]
We construct a novel interpretable dual domain network, termed InDuDoNet+, into which CT imaging process is finely embedded. We analyze the CT values among different tissues, and merge the prior observations into a prior network for our InDuDoNet+, which significantly improve its generalization performance.
arXiv Detail & Related papers (2021-12-23T15:52:37Z)
MIMO: Mutual Integration of Patient Journey and Medical Ontology for Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Generalization of Deep Convolutional Neural Networks -- A Case-study on Open-source Chest Radiographs [2.934426478974089]
One major challenge is to conceive a DCNN model with remarkable performance on both internal and external data. We demonstrate that DCNNs may not generalize to new data, but increasing the quality and heterogeneity of the training data helps to improve the generalizibility factor.
arXiv Detail & Related papers (2020-07-11T14:37:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.