Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset
- URL: http://arxiv.org/abs/2508.15986v1
- Date: Thu, 21 Aug 2025 22:09:53 GMT
- Title: Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset
- Authors: Jerry Cao-Xue, Tien Comlekoglu, Keyi Xue, Guanliang Wang, Jiang Li, Gordon Laurie,
- Abstract summary: We develop an end-to-end deep learning pipeline to classify eleven retinal diseases.<n>We show that models trained exclusively on synthetic data can accurately classify multiple pathologies and generalize effectively to real clinical images.
- Score: 1.996975578218265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development of multi-label deep learning models for retinal disease classification is often hindered by the scarcity of large, expertly annotated clinical datasets due to patient privacy concerns and high costs. The recent release of SynFundus-1M, a high-fidelity synthetic dataset with over one million fundus images, presents a novel opportunity to overcome these barriers. To establish a foundational performance benchmark for this new resource, we developed an end-to-end deep learning pipeline, training six modern architectures (ConvNeXtV2, SwinV2, ViT, ResNet, EfficientNetV2, and the RETFound foundation model) to classify eleven retinal diseases using a 5-fold multi-label stratified cross-validation strategy. We further developed a meta-ensemble model by stacking the out-of-fold predictions with an XGBoost classifier. Our final ensemble model achieved the highest performance on the internal validation set, with a macro-average Area Under the Receiver Operating Characteristic Curve (AUC) of 0.9973. Critically, the models demonstrated strong generalization to three diverse, real-world clinical datasets, achieving an AUC of 0.7972 on a combined DR dataset, an AUC of 0.9126 on the AIROGS glaucoma dataset and a macro-AUC of 0.8800 on the multi-label RFMiD dataset. This work provides a robust baseline for future research on large-scale synthetic datasets and establishes that models trained exclusively on synthetic data can accurately classify multiple pathologies and generalize effectively to real clinical images, offering a viable pathway to accelerate the development of comprehensive AI systems in ophthalmology.
Related papers
- Residual GRU+MHSA: A Lightweight Hybrid Recurrent Attention Model for Cardiovascular Disease Detection [1.267904597444312]
We propose Residual GRU with Multi-Head Self-Attention, a compact deep learning architecture for clinical records.<n>We evaluate the model on the UCI Heart Disease dataset using 5-fold stratified cross-validation.<n>The proposed model achieves an accuracy of 0.861, macro-F1 of 0.860, ROC-AUC of 0.908, and PR-AUC of 0.904, outperforming all baselines.
arXiv Detail & Related papers (2025-12-16T16:33:59Z) - A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - Functional Localization Enforced Deep Anomaly Detection Using Fundus Images [3.0606378255830253]
Diabetic retinopathy and age-related macular degeneration were detected reliably, whereas glaucoma remained the most frequently misclassified disease.<n>We developed a GANomaly-based anomaly detector, achieving an AUC of 0.76 while providing inherent reconstruction-based explainability and robust generalization to unseen data.
arXiv Detail & Related papers (2025-11-23T21:56:40Z) - Hyperparameter Optimization and Reproducibility in Deep Learning Model Training [5.851295230237131]
Reproducibility remains a critical challenge in foundation model training for histopathology.<n>We trained a CLIP model on the QUILT-1M dataset.<n>We identified clear trends: RandomResizedCrop values of 0.7-0.8 outperformed more aggressive (0.6) or conservative (0.9) settings.
arXiv Detail & Related papers (2025-10-16T21:57:52Z) - A Novel Multi-branch ConvNeXt Architecture for Identifying Subtle Pathological Features in CT Scans [1.2461503242570642]
This paper introduces a novel multi-branch ConvNeXt architecture designed specifically for the nuanced challenges of medical image analysis.<n>The proposed model incorporates a rigorous end-to-end pipeline, from meticulous data preprocessing to augmentation to a disciplined two-phase training strategy.<n> Experimental results demonstrate a superior performance on the validation set, achieving a final ROC-AUC of 0.9937, a validation accuracy of 0.9757, and an F1-score of 0.9825 for COVID-19 cases.
arXiv Detail & Related papers (2025-10-10T08:00:46Z) - CellPainTR: Generalizable Representation Learning for Cross-Dataset Cell Painting Analysis [51.56484100374058]
We introduce CellPainTR, a Transformer-based architecture designed to learn foundational representations of cellular morphology.<n>Our work represents a significant step towards creating truly foundational models for image-based profiling, enabling more reliable and scalable cross-study biological analysis.
arXiv Detail & Related papers (2025-09-02T03:30:07Z) - PySeizure: A single machine learning classifier framework to detect seizures in diverse datasets [0.0]
We introduce an innovative, open-source machine-learning framework that enables robust seizure detection across varied clinical datasets.<n>To enhance robustness, the framework incorporates an automated pre-processing pipeline to standardise data and a majority voting mechanism.<n>We train, tune, and evaluate models within each dataset, assessing their cross-dataset transferability.
arXiv Detail & Related papers (2025-08-10T09:12:29Z) - A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion [5.15423063632115]
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection.<n>Existing AI approaches fall short by focusing on single view inputs or single-task outputs.<n>We propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views.
arXiv Detail & Related papers (2025-07-22T18:52:18Z) - Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy [0.9999629695552196]
The present work develops and validates a data-driven and interpretable machine-learning framework designed to predict strokes.<n>Ten routinely gathered demographic, lifestyle, and clinical variables were sourced from a public cohort of 4,981 records.<n>The proposed model achieved an accuracy rate of 97.2% and an F1-score of 97.15%, indicating a significant enhancement compared to the leading individual model.
arXiv Detail & Related papers (2025-05-18T21:46:45Z) - Prototype-Guided Diffusion for Digital Pathology: Achieving Foundation Model Performance with Minimal Clinical Data [6.318463500874778]
We propose a prototype-guided diffusion model to generate high-fidelity synthetic pathology data at scale.<n>Our approach ensures biologically and diagnostically meaningful variations in the generated data.<n>We demonstrate that self-supervised features trained on our synthetic dataset achieve competitive performance despite using 60x-760x less data than models trained on large real-world datasets.
arXiv Detail & Related papers (2025-04-15T21:17:39Z) - ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models [46.80682547774335]
We propose ScaleMAI, an agent of AI-integrated data curation and annotation.<n>First, ScaleMAI creates a dataset of 25,362 CT scans, including per-voxel annotations for benign/malignant tumors and 24 anatomical structures.<n>Second, through progressive human-in-the-loop iterations, ScaleMAI provides Flagship AI Model that can approach the proficiency of expert annotators in detecting pancreatic tumors.
arXiv Detail & Related papers (2025-01-06T22:12:00Z) - A multimodal ensemble approach for clear cell renal cell carcinoma treatment outcome prediction [6.199310532720352]
We developed a multi-modal ensemble model (MMEM) that integrates clinical data, multi-omics data, and histopathology whole slide image (WSI) data.<n>MMEM predicted overall survival (OS) and disease-free survival (DFS) for ccRCC patients.
arXiv Detail & Related papers (2024-12-10T02:51:14Z) - Dataset Distillation for Histopathology Image Classification [46.04496989951066]
We introduce a novel dataset distillation algorithm tailored for histopathology image datasets (Histo-DD)
We conduct a comprehensive evaluation of the effectiveness of the proposed algorithm and the generated histopathology samples in both patch-level and slide-level classification tasks.
arXiv Detail & Related papers (2024-08-19T05:53:38Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.