Related papers: Rethinking Metrics and Benchmarks of Video Anomaly Detection

Rethinking Metrics and Benchmarks of Video Anomaly Detection

URL: http://arxiv.org/abs/2505.19022v1
Date: Sun, 25 May 2025 08:09:42 GMT
Title: Rethinking Metrics and Benchmarks of Video Anomaly Detection
Authors: Zihao Liu, Xiaoyu Wu, Wenna Li, Linlin Yang,
Abstract summary: Video Anomaly Detection (VAD) aims to detect anomalies that deviate from expectation.<n>In this paper, we rethink VAD evaluation protocols through comprehensive experimental analyses.<n>We propose three novel evaluation methods to address these limitations.
Score: 12.500876355560184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video Anomaly Detection (VAD), which aims to detect anomalies that deviate from expectation, has attracted increasing attention in recent years. Existing advancements in VAD primarily focus on model architectures and training strategies, while devoting insufficient attention to evaluation metrics and benchmarks. In this paper, we rethink VAD evaluation protocols through comprehensive experimental analyses, revealing three critical limitations in current practices: 1) existing metrics are significantly influenced by single annotation bias; 2) current metrics fail to reward early detection of anomalies; 3) available benchmarks lack the capability to evaluate scene overfitting. To address these limitations, we propose three novel evaluation methods: first, we establish averaged AUC/AP metrics over multi-round annotations to mitigate single annotation bias; second, we develop a Latency-aware Average Precision (LaAP) metric that rewards early and accurate anomaly detection; and finally, we introduce two hard normal benchmarks (UCF-HN, MSAD-HN) with videos specifically designed to evaluate scene overfitting. We report performance comparisons of ten state-of-the-art VAD approaches using our proposed evaluation methods, providing novel perspectives for future VAD model development.

Related papers

Continual-MEGA: A Large-scale Benchmark for Generalizable Continual Anomaly Detection [11.416875086993139]
We introduce a new benchmark for continual learning in anomaly detection, aimed at better reflecting real-world deployment scenarios.<n>Our benchmark, Continual-MEGA, includes a large and diverse dataset that significantly expands existing evaluation settings.<n>We propose a novel scenario that measures zero-shot generalization to unseen classes, those not observed during continual adaptation.
arXiv Detail & Related papers (2025-06-01T11:00:24Z)
Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment [63.07424521895492]
Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to machine learning models T.<n>The standard evaluation framework for such attacks relies on an evaluation model E, trained under the same task design as T.<n>This framework has become the de facto standard for assessing progress in MI research, used across nearly all recent MI attacks and defenses without question.
arXiv Detail & Related papers (2025-05-06T13:32:12Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged. In this paper, we study if there are any deficiencies in reference-free metrics. We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z)
AUPIMO: Redefining Visual Anomaly Detection Benchmarks with High Speed and Low Tolerance [0.562479170374811]
Per-IMage Overlap (PIMO) is a novel metric that addresses the shortcomings of AUROC and AUPRO. measuring recall per image simplifies computation and is more robust to noisy annotations. Our experiments demonstrate that PIMO offers practical advantages and nuanced performance insights.
arXiv Detail & Related papers (2024-01-03T21:24:44Z)
Toward Reliable Human Pose Forecasting with Uncertainty [51.628234388046195]
We develop an open-source library for human pose forecasting, including multiple models, supporting several datasets. We devise two types of uncertainty in the problem to increase performance and convey better trust.
arXiv Detail & Related papers (2023-04-13T17:56:08Z)
Enhancing Evaluation Methods for Infrared Small-Target Detection in Real-world Scenarios [2.6723845245975064]
Infrared small target detection (IRSTD) poses a significant challenge in the field of computer vision. There has been a lack of extensive investigation into the evaluation metrics used for assessing their performance. We employ a systematic approach to address this issue by first evaluating the effectiveness of existing metrics and then proposing new metrics to overcome the limitations of conventional ones.
arXiv Detail & Related papers (2023-01-10T05:40:28Z)
Unsupervised Anomaly Detection in Time-series: An Extensive Evaluation and Analysis of State-of-the-art Methods [10.618572317896515]
Unsupervised anomaly detection in time-series has been extensively investigated in the literature. This paper proposes an in-depth evaluation study of recent unsupervised anomaly detection techniques in time-series.
arXiv Detail & Related papers (2022-12-06T15:05:54Z)
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks. We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection. Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.