NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI
- URL: http://arxiv.org/abs/2505.14064v1
- Date: Tue, 20 May 2025 08:10:57 GMT
- Title: NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI
- Authors: Cosmin I. Bercea, Jun Li, Philipp Raffler, Evamaria O. Riedel, Lena Schmitzer, Angela Kurz, Felix Bitzer, Paula Roßmüller, Julian Canisius, Mirjam L. Beyrle, Che Liu, Wenjia Bai, Bernhard Kainz, Julia A. Schnabel, Benedikt Wiestler,
- Abstract summary: In many real-world applications, deployed models encounter inputs that differ from the data seen during training.<n>We present $NOVA$, a challenging, real-life $sim$900 brain MRI scans that span 281 rare pathologies and heterogeneous acquisition protocols.<n>Each case includes rich clinical narratives and double-blinded expert bounding-box annotations.<n>Because NOVA is never used for training, it serves as an $extreme$ stress-test of out-of-distribution generalisation.
- Score: 11.644236197583405
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Out-of-distribution detection identifies whether an input stems from an unseen distribution, while open-world recognition flags such inputs to ensure the system remains robust as ever-emerging, previously $unknown$ categories appear and must be addressed without retraining. Foundation and vision-language models are pre-trained on large and diverse datasets with the expectation of broad generalization across domains, including medical imaging. However, benchmarking these models on test sets with only a few common outlier types silently collapses the evaluation back to a closed-set problem, masking failures on rare or truly novel conditions encountered in clinical use. We therefore present $NOVA$, a challenging, real-life $evaluation-only$ benchmark of $\sim$900 brain MRI scans that span 281 rare pathologies and heterogeneous acquisition protocols. Each case includes rich clinical narratives and double-blinded expert bounding-box annotations. Together, these enable joint assessment of anomaly localisation, visual captioning, and diagnostic reasoning. Because NOVA is never used for training, it serves as an $extreme$ stress-test of out-of-distribution generalisation: models must bridge a distribution gap both in sample appearance and in semantic space. Baseline results with leading vision-language models (GPT-4o, Gemini 2.0 Flash, and Qwen2.5-VL-72B) reveal substantial performance drops across all tasks, establishing NOVA as a rigorous testbed for advancing models that can detect, localize, and reason about truly unknown anomalies.
Related papers
- OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection [1.0190194769786831]
Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data.<n>We propose a novel method that tightly couples representation learning with an analytically solvable one-class SVM.<n>The model is evaluated on two tasks: a new benchmark based on MNIST-C, and a challenging brain MRI subtle lesion detection task.
arXiv Detail & Related papers (2025-07-25T13:00:40Z) - BenchReAD: A systematic benchmark for retinal anomaly detection [23.15668882564837]
We introduce a benchmark for retinal anomaly detection, which is comprehensive and systematic in terms of data and algorithm.<n>We find that a fully supervised approach leveraging disentangled representations of abnormalities (DRA) achieves the best performance but suffers from significant drops in performance when encountering certain unseen anomalies.<n>Inspired by the memory bank mechanisms in one-class supervised learning, we propose NFM-DRA, which integrates DRA with a Normal Feature Memory to mitigate the performance degradation.
arXiv Detail & Related papers (2025-07-14T17:13:08Z) - Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection [88.34095233600719]
FAPrompt is a novel framework designed to learn Fine-grained Abnormality Prompts for more accurate ZSAD.
It substantially outperforms state-of-the-art methods by at least 3%-5% AUC/AP in both image- and pixel-level ZSAD tasks.
arXiv Detail & Related papers (2024-10-14T08:41:31Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts [25.629973843455495]
Generalist Anomaly Detection (GAD) aims to train one single detection model that can generalize to detect anomalies in diverse datasets from different application domains without further training on the target data.
We introduce a novel approach that learns an in-context residual learning model for GAD, termed InCTRL.
InCTRL is the best performer and significantly outperforms state-of-the-art competing methods.
arXiv Detail & Related papers (2024-03-11T08:07:46Z) - Rescuing referral failures during automated diagnosis of domain-shifted
medical images [17.349847762608086]
We show that even state-of-the-art domain generalization approaches fail severely during referral when tested on medical images acquired from a different demographic or using a different technology.
We evaluate novel combinations of robust generalization and post hoc referral approaches, that rescue these failures and achieve significant performance improvements.
arXiv Detail & Related papers (2023-11-28T13:14:55Z) - Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image.
In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting.
Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Dual-distribution discrepancy with self-supervised refinement for
anomaly detection in medical images [29.57501199670898]
We introduce one-class semi-supervised learning (OC-SSL) to utilize known normal and unlabeled images for training.
Ensembles of reconstruction networks are designed to model the distribution of normal images and the distribution of both normal and unlabeled images.
We propose a new perspective on self-supervised learning, which is designed to refine the anomaly scores rather than detect anomalies directly.
arXiv Detail & Related papers (2022-10-09T11:18:45Z) - StRegA: Unsupervised Anomaly Detection in Brain MRIs using a Compact
Context-encoding Variational Autoencoder [48.2010192865749]
Unsupervised anomaly detection (UAD) can learn a data distribution from an unlabelled dataset of healthy subjects and then be applied to detect out of distribution samples.
This research proposes a compact version of the "context-encoding" VAE (ceVAE) model, combined with pre and post-processing steps, creating a UAD pipeline (StRegA)
The proposed pipeline achieved a Dice score of 0.642$pm$0.101 while detecting tumours in T2w images of the BraTS dataset and 0.859$pm$0.112 while detecting artificially induced anomalies.
arXiv Detail & Related papers (2022-01-31T14:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.