Generalist versus Specialist Vision Foundation Models for Ocular Disease and Oculomics
- URL: http://arxiv.org/abs/2509.03421v1
- Date: Wed, 03 Sep 2025 15:48:57 GMT
- Title: Generalist versus Specialist Vision Foundation Models for Ocular Disease and Oculomics
- Authors: Yukun Zhou, Paul Nderitu, Jocelyn Hui Lin Goh, Justin Engelmann, Siegfried K. Wagner, Anran Ran, Hongyang Jiang, Lie Ju, Ke Zou, Sahana Srinivasan, Hyunmin Kim, Takahiro Ninomiya, Zheyuan Wang, Gabriel Dawei Yang, Eden Ruffell, Dominic Williamson, Rui Santos, Gabor Mark Somfai, Carol Y. Cheung, Tien Yin Wong, Daniel C. Alexander, Yih Chung Tham, Pearse A. Keane,
- Abstract summary: We evaluate the adaptability of DINOv2 and DINOv3 in retinal image applications, compared to two specialist RETFound models.<n> RETFound-DINOv2 consistently outperforms these generalist foundation models in ocular-disease detection and oculomics tasks.
- Score: 10.703307148153433
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Medical foundation models, pre-trained with large-scale clinical data, demonstrate strong performance in diverse clinically relevant applications. RETFound, trained on nearly one million retinal images, exemplifies this approach in applications with retinal images. However, the emergence of increasingly powerful and multifold larger generalist foundation models such as DINOv2 and DINOv3 raises the question of whether domain-specific pre-training remains essential, and if so, what gap persists. To investigate this, we systematically evaluated the adaptability of DINOv2 and DINOv3 in retinal image applications, compared to two specialist RETFound models, RETFound-MAE and RETFound-DINOv2. We assessed performance on ocular disease detection and systemic disease prediction using two adaptation strategies: fine-tuning and linear probing. Data efficiency and adaptation efficiency were further analysed to characterise trade-offs between predictive performance and computational cost. Our results show that although scaling generalist models yields strong adaptability across diverse tasks, RETFound-DINOv2 consistently outperforms these generalist foundation models in ocular-disease detection and oculomics tasks, demonstrating stronger generalisability and data efficiency. These findings suggest that specialist retinal foundation models remain the most effective choice for clinical applications, while the narrowing gap with generalist foundation models suggests that continued data and model scaling can deliver domain-relevant gains and position them as strong foundations for future medical foundation models.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning [3.771396977579353]
PRETI is a retinal foundation model that integrates metadata-aware learning with robust self-supervised representation learning.<n>We construct patient-level data pairs, associating images from the same individual to improve robustness against non-clinical variations.<n>Experiments demonstrate PRETI achieves state-of-the-art results across diverse diseases and biomarker predictions.
arXiv Detail & Related papers (2025-05-18T04:59:03Z) - AI-Assisted Colonoscopy: Polyp Detection and Segmentation using Foundation Models [0.10037949839020764]
In colonoscopy, 80% of the missed polyps could be detected with the help of Deep Learning models.<n>In the search for algorithms capable of addressing this challenge, foundation models emerge as promising candidates.<n>Their zero-shot or few-shot learning capabilities, facilitate generalization to new data or tasks without extensive fine-tuning.<n>A comprehensive evaluation of foundation models for polyp segmentation was conducted, assessing both detection and delimitation.
arXiv Detail & Related papers (2025-03-31T14:20:53Z) - Retinal Fundus Multi-Disease Image Classification using Hybrid CNN-Transformer-Ensemble Architectures [0.3277163122167434]
Our research is motivated by the urgent global issue of a large population affected by retinal diseases.<n>Our primary objective is to develop a comprehensive diagnostic system capable of accurately predicting retinal diseases.
arXiv Detail & Related papers (2025-03-27T12:55:07Z) - LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks.
We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD)
LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z) - Block Expanded DINORET: Adapting Natural Domain Foundation Models for Retinal Imaging Without Catastrophic Forgetting [1.2573191100165562]
We adapted the DINOv2 vision transformer for retinal imaging classification tasks using self-supervised learning.
We generated two novel foundation models termed DINORET and BE DINORET.
Our few-shot learning studies indicated that DINORET and BE DINORET outperform RETFound in terms of data-efficiency.
arXiv Detail & Related papers (2024-09-25T20:17:16Z) - Evaluating General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks [5.8941124219471055]
DINOv2 is an open-source foundation model pre-trained with self-supervised learning on 142 million curated natural images.
This study comprehensively evaluates the performance DINOv2 for radiology.
arXiv Detail & Related papers (2023-12-04T21:47:10Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - IA-GCN: Interpretable Attention based Graph Convolutional Network for
Disease prediction [47.999621481852266]
We propose an interpretable graph learning-based model which interprets the clinical relevance of the input features towards the task.
In a clinical scenario, such a model can assist the clinical experts in better decision-making for diagnosis and treatment planning.
Our proposed model shows superior performance with respect to compared methods with an increase in an average accuracy of 3.2% for Tadpole, 1.6% for UKBB Gender, and 2% for the UKBB Age prediction task.
arXiv Detail & Related papers (2021-03-29T13:04:02Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.