Related papers: Comparative Analysis of Deep Learning Strategies for Hypertensive Retinopathy Detection from Fundus Images: From Scratch and Pre-trained Models

Comparative Analysis of Deep Learning Strategies for Hypertensive Retinopathy Detection from Fundus Images: From Scratch and Pre-trained Models

URL: http://arxiv.org/abs/2506.12492v1
Date: Sat, 14 Jun 2025 13:11:33 GMT
Title: Comparative Analysis of Deep Learning Strategies for Hypertensive Retinopathy Detection from Fundus Images: From Scratch and Pre-trained Models
Authors: Yanqiao Zhu,
Abstract summary: This paper presents a comparative analysis of deep learning strategies for detecting hypertensive retinopathy from fundus images.<n>We investigate three distinct approaches: a custom CNN, a suite of pre-trained transformer-based models, and an AutoML solution.
Score: 5.860609259063137
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a comparative analysis of deep learning strategies for detecting hypertensive retinopathy from fundus images, a central task in the HRDC challenge~\cite{qian2025hrdc}. We investigate three distinct approaches: a custom CNN, a suite of pre-trained transformer-based models, and an AutoML solution. Our findings reveal a stark, architecture-dependent response to data augmentation. Augmentation significantly boosts the performance of pure Vision Transformers (ViTs), which we hypothesize is due to their weaker inductive biases, forcing them to learn robust spatial and structural features. Conversely, the same augmentation strategy degrades the performance of hybrid ViT-CNN models, whose stronger, pre-existing biases from the CNN component may be "confused" by the transformations. We show that smaller patch sizes (ViT-B/8) excel on augmented data, enhancing fine-grained detail capture. Furthermore, we demonstrate that a powerful self-supervised model like DINOv2 fails on the original, limited dataset but is "rescued" by augmentation, highlighting the critical need for data diversity to unlock its potential. Preliminary tests with a ViT-Large model show poor performance, underscoring the risk of using overly-capacitive models on specialized, smaller datasets. This work provides critical insights into the interplay between model architecture, data augmentation, and dataset size for medical image classification.

Related papers

Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification [0.08192907805418585]
Self-supervised learning models provide data-efficient and remarkable solutions to limited dataset problems. This paper introduces a generative SSL model for brain tumor classification in two stages. The proposed model attains the highest accuracy, achieving 90.56% on the BraTs dataset with T1 sequence, 98.53% on the Figshare, and 98.47% on the Kaggle brain tumor datasets.
arXiv Detail & Related papers (2024-11-19T21:42:57Z)
Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training. The capacity to generalize effectively on smaller datasets remains a persistent challenge. We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z)
A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance. We propose a simple yet effective data augmentation approach by leveraging advancements in generative models. Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z)
Deep models for stroke segmentation: do complex architectures always perform better? [1.4651272514940197]
Stroke segmentation plays a crucial role in the diagnosis and treatment of stroke patients. Deep models have been introduced for general medical image segmentation. In this study, we selected four types of deep models that were recently proposed and evaluated their performance for stroke segmentation.
arXiv Detail & Related papers (2024-03-25T20:44:01Z)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets. We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision. Model compression detrimentally impacts the performance of visual prompting-based transfer. However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z)
Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection [2.359557447960552]
Vision transformers (ViT) have emerged in recent years as an alternative to CNNs for several computer vision applications. We tested variants of the ViT architecture for a range of desired neuroimaging downstream tasks based on difficulty. We achieved a performance boost of 5% and 9-10% upon fine-tuning vision transformer models pre-trained on synthetic and real MRI scans.
arXiv Detail & Related papers (2023-03-14T20:18:12Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Are Vision Transformers Robust to Spurious Correlations? [23.73056953692978]
Deep neural networks may be susceptible to learning spurious correlations that hold on average but not in atypical test samples. We investigate the robustness of vision transformers to spurious correlations on three benchmark datasets. Key to their success is the ability to generalize better from the examples where spurious correlations do not hold.
arXiv Detail & Related papers (2022-03-17T07:03:37Z)
Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.