Related papers: Functional Localization Enforced Deep Anomaly Detection Using Fundus Images

Functional Localization Enforced Deep Anomaly Detection Using Fundus Images

URL: http://arxiv.org/abs/2511.18627v1
Date: Sun, 23 Nov 2025 21:56:40 GMT
Title: Functional Localization Enforced Deep Anomaly Detection Using Fundus Images
Authors: Jan Benedikt Ruhland, Thorsten Papenbrock, Jan-Peter Sowa, Ali Canbay, Nicole Eter, Bernd Freisleben, Dominik Heider,
Abstract summary: Diabetic retinopathy and age-related macular degeneration were detected reliably, whereas glaucoma remained the most frequently misclassified disease.<n>We developed a GANomaly-based anomaly detector, achieving an AUC of 0.76 while providing inherent reconstruction-based explainability and robust generalization to unseen data.
Score: 3.0606378255830253
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reliable detection of retinal diseases from fundus images is challenged by the variability in imaging quality, subtle early-stage manifestations, and domain shift across datasets. In this study, we systematically evaluated a Vision Transformer (ViT) classifier under multiple augmentation and enhancement strategies across several heterogeneous public datasets, as well as the AEyeDB dataset, a high-quality fundus dataset created in-house and made available for the research community. The ViT demonstrated consistently strong performance, with accuracies ranging from 0.789 to 0.843 across datasets and diseases. Diabetic retinopathy and age-related macular degeneration were detected reliably, whereas glaucoma remained the most frequently misclassified disease. Geometric and color augmentations provided the most stable improvements, while histogram equalization benefited datasets dominated by structural subtlety. Laplacian enhancement reduced performance across different settings. On the Papila dataset, the ViT with geometric augmentation achieved an AUC of 0.91, outperforming previously reported convolutional ensemble baselines (AUC of 0.87), underscoring the advantages of transformer architectures and multi-dataset training. To complement the classifier, we developed a GANomaly-based anomaly detector, achieving an AUC of 0.76 while providing inherent reconstruction-based explainability and robust generalization to unseen data. Probabilistic calibration using GUESS enabled threshold-independent decision support for future clinical implementation.

Related papers

HypCBC: Domain-Invariant Hyperbolic Cross-Branch Consistency for Generalizable Medical Image Analysis [1.8747639074211104]
We present the first comprehensive validation of hyperbolic representation learning for medical image analysis.<n>We demonstrate statistically significant gains across eleven in-distribution datasets and three ViT models.<n>Our proposed method promotes domain-invariant features and outperforms state-of-the-art Euclidean methods by an average of $+2.1%$ AUC.
arXiv Detail & Related papers (2026-02-03T08:50:24Z)
Beyond CLIP: Knowledge-Enhanced Multimodal Transformers for Cross-Modal Alignment in Diabetic Retinopathy Diagnosis [7.945705180020063]
We propose a knowledge-enhanced joint embedding framework that integrates retinal fundus images, clinical text, and structured patient data.<n>Our framework achieves near-perfect text-to-image retrieval performance with Recall@1 of 99.94% compared to fine-tuned CLIP's 1.29%.
arXiv Detail & Related papers (2025-12-22T18:41:45Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
Automated Multi-label Classification of Eleven Retinal Diseases: A Benchmark of Modern Architectures and a Meta-Ensemble on a Large Synthetic Dataset [1.996975578218265]
We develop an end-to-end deep learning pipeline to classify eleven retinal diseases.<n>We show that models trained exclusively on synthetic data can accurately classify multiple pathologies and generalize effectively to real clinical images.
arXiv Detail & Related papers (2025-08-21T22:09:53Z)
PySeizure: A single machine learning classifier framework to detect seizures in diverse datasets [0.0]
We introduce an innovative, open-source machine-learning framework that enables robust seizure detection across varied clinical datasets.<n>To enhance robustness, the framework incorporates an automated pre-processing pipeline to standardise data and a majority voting mechanism.<n>We train, tune, and evaluate models within each dataset, assessing their cross-dataset transferability.
arXiv Detail & Related papers (2025-08-10T09:12:29Z)
Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology [41.34847597178388]
Vision foundation models (FMs) learn to represent histological features in highly heterogeneous tiles extracted from whole-slide images.<n>We investigate the potential of unsupervised automatic data curation at the tile-level, taking into account 350 million tiles.
arXiv Detail & Related papers (2025-03-24T14:23:48Z)
ADformer: A Multi-Granularity Spatial-Temporal Transformer for EEG-Based Alzheimer Detection [42.72554952799386]
EEG has emerged as a cost-effective and efficient tool to support neurologists in the detection of Alzheimer's Disease (AD)<n>We propose ADformer, a novel multi-granularity spatial-temporal transformer designed to capture both temporal and spatial features from raw EEG signals.<n> Experimental results demonstrate that ADformer consistently outperforms existing methods, achieving subject-level F1 scores of 92.82%, 89.83%, 67.99%, and 83.98% on the 4 datasets, respectively.
arXiv Detail & Related papers (2024-08-17T14:10:41Z)
Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma [4.578027879885667]
This research aims to improve glioblastoma survival prediction by integrating MR images, clinical and molecular-pathologic data in a transformer-based deep learning model. The model employs self-supervised learning techniques to effectively encode the high-dimensional MRI input for integration with non-imaging data using cross-attention.
arXiv Detail & Related papers (2024-05-21T17:44:48Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images [45.894671834869975]
Glaucoma is one of the major eye diseases that leads to progressive optic nerve fiber damage and irreversible blindness. We introduce the Multi-scale Spatio-temporal Transformer Network (MST-former) based on the transformer architecture tailored for sequential image inputs. Our method shows excellent generalization capability on the Alzheimer's Disease Neuroimaging Initiative (ADNI) MRI dataset, with an accuracy of 90.3% for mild cognitive impairment and Alzheimer's disease prediction.
arXiv Detail & Related papers (2024-02-21T02:16:59Z)
Affinity Feature Strengthening for Accurate, Complete and Robust Vessel Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms. We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z)
Multi-Label Retinal Disease Classification using Transformers [0.0]
A new multi-label retinal disease dataset, MuReD, is constructed, using a number of publicly available datasets for fundus disease classification. A transformer-based model optimized through extensive experimentation is used for image analysis and decision making. It is shown that the approach performs better than state-of-the-art works on the same task by 7.9% and 8.1% in terms of AUC score for disease detection and disease classification.
arXiv Detail & Related papers (2022-07-05T22:06:52Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Classification of COVID-19 in CT Scans using Multi-Source Transfer Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans. With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet. Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.