Deep Unsupervised Anomaly Detection in Brain Imaging: Large-Scale Benchmarking and Bias Analysis
- URL: http://arxiv.org/abs/2512.01534v1
- Date: Mon, 01 Dec 2025 11:03:27 GMT
- Title: Deep Unsupervised Anomaly Detection in Brain Imaging: Large-Scale Benchmarking and Bias Analysis
- Authors: Alexander Frotscher, Christian F. Baumgartner, Thomas Wolfers,
- Abstract summary: We present a large-scale, multi-center benchmark of deep unsupervised anomaly detection for brain imaging.<n>We tested 2,221 T1w and 1,262 T2w scans spanning healthy datasets and diverse clinical cohorts.<n>Our benchmark establishes a transparent foundation for future research and highlights priorities for clinical translation.
- Score: 42.60508892284938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep unsupervised anomaly detection in brain magnetic resonance imaging offers a promising route to identify pathological deviations without requiring lesion-specific annotations. Yet, fragmented evaluations, heterogeneous datasets, and inconsistent metrics have hindered progress toward clinical translation. Here, we present a large-scale, multi-center benchmark of deep unsupervised anomaly detection for brain imaging. The training cohort comprised 2,976 T1 and 2,972 T2-weighted scans from healthy individuals across six scanners, with ages ranging from 6 to 89 years. Validation used 92 scans to tune hyperparameters and estimate unbiased thresholds. Testing encompassed 2,221 T1w and 1,262 T2w scans spanning healthy datasets and diverse clinical cohorts. Across all algorithms, the Dice-based segmentation performance varied between 0.03 and 0.65, indicating substantial variability. To assess robustness, we systematically evaluated the impact of different scanners, lesion types and sizes, as well as demographics (age, sex). Reconstruction-based methods, particularly diffusion-inspired approaches, achieved the strongest lesion segmentation performance, while feature-based methods showed greater robustness under distributional shifts. However, systematic biases, such as scanner-related effects, were observed for the majority of algorithms, including that small and low-contrast lesions were missed more often, and that false positives varied with age and sex. Increasing healthy training data yields only modest gains, underscoring that current unsupervised anomaly detection frameworks are limited algorithmically rather than by data availability. Our benchmark establishes a transparent foundation for future research and highlights priorities for clinical translation, including image native pretraining, principled deviation measures, fairness-aware modeling, and robust domain adaptation.
Related papers
- Hybrid Approach for Enhancing Lesion Segmentation in Fundus Images [0.0]
Choroidal nevi are benign pigmented lesions in the eye, with a small risk of transforming into melanoma.<n>Early detection is critical to improving survival rates, but misdiagnosis or delayed diagnosis can lead to poor outcomes.<n>This paper proposes a novel approach that combines mathematical/clustering segmentation models with insights from U-Net.
arXiv Detail & Related papers (2025-09-29T22:10:56Z) - Fast-staged CNN Model for Accurate pulmonary diseases and Lung cancer detection [0.0]
This research evaluates a deep learning model designed to detect lung cancer, specifically pulmonary nodules, along with eight other lung pathologies, using chest radiographs.<n>A two-stage classification system, utilizing ensemble methods and transfer learning, is employed to first triage images into Normal or Abnormal.<n>The model achieves notable results in classification, with a top-performing accuracy of 77%, a sensitivity of 0.713, a specificity of 0.776 during external validation, and an AUC score of 0.888.
arXiv Detail & Related papers (2024-12-16T11:47:07Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - A Two-Stage Generative Model with CycleGAN and Joint Diffusion for
MRI-based Brain Tumor Detection [41.454028276986946]
We propose a novel framework Two-Stage Generative Model (TSGM) to improve brain tumor detection and segmentation.
CycleGAN is trained on unpaired data to generate abnormal images from healthy images as data prior.
VE-JP is implemented to reconstruct healthy images using synthetic paired abnormal images as a guide.
arXiv Detail & Related papers (2023-11-06T12:58:26Z) - The role of noise in denoising models for anomaly detection in medical
images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images.
Unsupervised anomaly detection approaches have been proposed using only normal data for training.
We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z) - Mitosis domain generalization in histopathology images -- The MIDOG
challenge [12.69088811541426]
Recognition of mitotic figures by pathologists is subject to a strong inter-rater bias, which limits the prognostic value.
State-of-the-art deep learning methods can support the expert in this assessment but are known to strongly deteriorate when applied in a different clinical environment than was used for training.
The MICCAI MIDOG 2021 challenge has been to propose and evaluate methods that derive scanner-agnostic mitosis detection algorithms.
arXiv Detail & Related papers (2022-04-06T11:43:10Z) - StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact
Context-encoding Variational Autoencoder [48.2010192865749]
Unsupervised anomaly detection (UAD) can learn a data distribution from an unlabelled dataset of healthy subjects and then be applied to detect out of distribution samples.
This research proposes a compact version of the "context-encoding" VAE (ceVAE) model, combined with pre and post-processing steps, creating a UAD pipeline (StRegA)
The proposed pipeline achieved a Dice score of 0.642$pm$0.101 while detecting tumours in T2w images of the BraTS dataset and 0.859$pm$0.112 while detecting artificially induced anomalies.
arXiv Detail & Related papers (2022-01-31T14:27:35Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Improved Slice-wise Tumour Detection in Brain MRIs by Computing
Dissimilarities between Latent Representations [68.8204255655161]
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with unsupervised methods.
We have proposed a slice-wise semi-supervised method for tumour detection based on the computation of a dissimilarity function in the latent space of a Variational AutoEncoder.
We show that by training the models on higher resolution images and by improving the quality of the reconstructions, we obtain results which are comparable with different baselines.
arXiv Detail & Related papers (2020-07-24T14:02:09Z) - Quantifying and Leveraging Predictive Uncertainty for Medical Image
Assessment [13.330243305948278]
We propose a system that learns not only the probabilistic estimate for classification, but also an explicit uncertainty measure.
We argue that this approach is essential to account for the inherent ambiguity characteristic of medical images from different radiologic exams.
In our experiments we demonstrate that sample rejection based on the predicted uncertainty can significantly improve the ROC-AUC for various tasks.
arXiv Detail & Related papers (2020-07-08T16:47:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.