Related papers: Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection

Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection

URL: http://arxiv.org/abs/2505.05291v2
Date: Thu, 22 May 2025 10:41:49 GMT
Title: Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection
Authors: Benjamin A. Cohen, Jonathan Fhima, Meishar Meisel, Baskin Meital, Luis Filipe Nakayama, Eran Berkowitz, Joachim A. Behar,
Abstract summary: Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to learn robust representations from large-scale natural image datasets.<n>We show that iBOT pretrained on natural images achieves the highest out-of-distribution generalization.
Score: 1.3162645769999362
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to learn robust representations from large-scale natural image datasets, enhancing their generalization across domains. In retinal imaging, foundation models pretrained on either natural or ophthalmic data have shown promise, but the benefits of in-domain pretraining remain uncertain. To investigate this, we benchmark six SSL-pretrained ViTs on seven digital fundus image (DFI) datasets totaling 70,000 expert-annotated images for the task of moderate-to-late age-related macular degeneration (AMD) identification. Our results show that iBOT pretrained on natural images achieves the highest out-of-distribution generalization, with AUROCs of 0.80-0.97, outperforming domain-specific models, which achieved AUROCs of 0.78-0.96 and a baseline ViT-L with no pretraining, which achieved AUROCs of 0.68-0.91. These findings highlight the value of foundation models in improving AMD identification and challenge the assumption that in-domain pretraining is necessary. Furthermore, we release BRAMD, an open-access dataset (n=587) of DFIs with AMD labels from Brazil.

Related papers

HOG-CNN: Integrating Histogram of Oriented Gradients with Convolutional Neural Networks for Retinal Image Classification [1.5939351525664014]
We propose an automated and interpretable clinical decision support framework based on a hybrid feature extraction model called HOG-CNN.<n>Our key contribution lies in the integration of handcrafted Histogram of Oriented Gradients (HOG) features with deep convolutional neural network (CNN) representations.<n>Our model achieves 98.5% accuracy and 99.2 AUC for binary DR classification, and 94.2 AUC for five-class DR classification.
arXiv Detail & Related papers (2025-07-29T22:54:28Z)
PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning [3.771396977579353]
PRETI is a retinal foundation model that integrates metadata-aware learning with robust self-supervised representation learning.<n>We construct patient-level data pairs, associating images from the same individual to improve robustness against non-clinical variations.<n>Experiments demonstrate PRETI achieves state-of-the-art results across diverse diseases and biomarker predictions.
arXiv Detail & Related papers (2025-05-18T04:59:03Z)
Deep Learning Ensemble for Predicting Diabetic Macular Edema Onset Using Ultra-Wide Field Color Fundus Image [2.9945018168793025]
Diabetic macular edema (DME) is a severe complication of diabetes.<n>We propose an ensemble method to predict ci-DME onset within a year.
arXiv Detail & Related papers (2024-10-09T02:16:29Z)
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset. We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z)
Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification [71.06194656633447]
We establish an uncertainty-inspired open-set (UIOS) model, which was trained with fundus images of 9 retinal conditions. Our UIOS model with thresholding strategy achieved an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set. UIOS correctly predicted high uncertainty scores, which would prompt the need for a manual check in the datasets of non-target categories retinal diseases, low-quality fundus images, and non-fundus images.
arXiv Detail & Related papers (2023-04-08T10:47:41Z)
Clinical Deterioration Prediction in Brazilian Hospitals Based on Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD) The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z)
Self-supervised contrastive learning of echocardiogram videos enables label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos. When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS) EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z)
Robust deep learning for eye fundus images: Bridging real and synthetic data for enhancing generalization [0.8599177028761124]
This work compares ten different GAN architectures to generate synthetic eye-fundus images with and without AMD. StyleGAN2 reached the lowest Frechet Inception Distance (166.17), and clinicians could not accurately differentiate between real and synthetic images. The accuracy rates were 82.8% for the test set and 81.3% for the STARE dataset, demonstrating the model's generalizability.
arXiv Detail & Related papers (2022-03-25T18:42:20Z)
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID) Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance. This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
Self6D: Self-Supervised Monocular 6D Object Pose Estimation [114.18496727590481]
We propose the idea of monocular 6D pose estimation by means of self-supervised learning. We leverage recent advances in neural rendering to further self-supervise the model on unannotated real RGB-D data.
arXiv Detail & Related papers (2020-04-14T13:16:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.