Language-guided Open-world Video Anomaly Detection under Weak Supervision
- URL: http://arxiv.org/abs/2503.13160v2
- Date: Thu, 30 Oct 2025 06:21:47 GMT
- Title: Language-guided Open-world Video Anomaly Detection under Weak Supervision
- Authors: Zihao Liu, Xiaoyu Wu, Jianqin Wu, Xuxu Wang, Linlin Yang,
- Abstract summary: Video anomaly detection (VAD) aims to detect anomalies that deviate from what is expected.<n>Existing methods assume that the definition of anomalies is invariable, and thus are not applicable to the open world.<n>We propose a novel open-world VAD paradigm with variable definitions, allowing guided detection through user-provided natural language at inference time.
- Score: 27.912128185225054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anomaly detection (VAD) aims to detect anomalies that deviate from what is expected. In open-world scenarios, the expected events may change as requirements change. For example, not wearing a mask may be considered abnormal during a flu outbreak but normal otherwise. However, existing methods assume that the definition of anomalies is invariable, and thus are not applicable to the open world. To address this, we propose a novel open-world VAD paradigm with variable definitions, allowing guided detection through user-provided natural language at inference time. This paradigm necessitates establishing a robust mapping from video and textual definition to anomaly scores. Therefore, we propose LaGoVAD (Language-guided Open-world Video Anomaly Detector), a model that dynamically adapts anomaly definitions under weak supervision with two regularization strategies: diversifying the relative durations of anomalies via dynamic video synthesis, and enhancing feature robustness through contrastive learning with negative mining. Training such adaptable models requires diverse anomaly definitions, but existing datasets typically provide labels without semantic descriptions. To bridge this gap, we collect PreVAD (Pre-training Video Anomaly Dataset), the largest and most diverse video anomaly dataset to date, featuring 35,279 annotated videos with multi-level category labels and descriptions that explicitly define anomalies. Zero-shot experiments on seven datasets demonstrate LaGoVAD's SOTA performance. Our dataset and code will be released at https://github.com/Kamino666/LaGoVAD-PreVAD.
Related papers
- Weakly Supervised Video Anomaly Detection with Anomaly-Connected Components and Intention Reasoning [23.043341269626016]
We propose a novel framework named LAS-VAD, short for Learning Anomaly Semantics for WS-VAD.<n>Our framework integrates anomaly-connected component mechanism and intention awareness mechanism.<n>It outperforms current state-of-the-art methods with remarkable gains.
arXiv Detail & Related papers (2026-02-28T08:57:33Z) - No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection [15.949619310702579]
Existing video anomaly detection methods under perform in open-world scenarios.<n>Key contributing factors include limited dataset diversity, and inadequate understanding of context-dependent anomalous semantics.<n>We propose LAVIDA, an end-to-end zero-shot video anomaly detection framework.
arXiv Detail & Related papers (2026-02-22T16:03:43Z) - Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline [63.96226274616927]
A new framework called Track Any Anomalous Object (TAO) introduces a granular video anomaly detection pipeline.<n>Unlike methods that assign anomaly scores to every pixel, our approach transforms the problem into pixel-level tracking of anomalous objects.<n>Experiments demonstrate that TAO sets new benchmarks in accuracy and robustness.
arXiv Detail & Related papers (2025-06-05T15:49:39Z) - Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection [88.34095233600719]
FAPrompt is a novel framework designed to learn Fine-grained Abnormality Prompts for more accurate ZSAD.
It substantially outperforms state-of-the-art methods by at least 3%-5% AUC/AP in both image- and pixel-level ZSAD tasks.
arXiv Detail & Related papers (2024-10-14T08:41:31Z) - VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con$$, which learns through context augmentations.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection [16.77262005540559]
A novel framework is proposed to guide the learning of suspected anomalies from event prompts.
It enables a new multi-prompt learning process to constrain the visual-semantic features across all videos.
Our proposed model outperforms most state-of-the-art methods in terms of AP or AUC.
arXiv Detail & Related papers (2024-03-02T10:42:47Z) - BatchNorm-based Weakly Supervised Video Anomaly Detection [117.11382325721016]
In weakly supervised video anomaly detection, temporal features of abnormal events often exhibit outlier characteristics.
We propose a novel method, BN-WVAD, which incorporates BatchNorm into WVAD.
The proposed BN-WVAD model demonstrates state-of-the-art performance on UCF-Crime with an AUC of 87.24%, and XD-Violence, where AP reaches up to 84.93%.
arXiv Detail & Related papers (2023-11-26T17:47:57Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection [103.06327681038304]
We propose a supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection.
Unlike existing data sets, we introduce abnormal events annotated at the pixel level at training time.
We show that UBnormal can enhance the performance of a state-of-the-art anomaly detection framework.
arXiv Detail & Related papers (2021-11-16T17:28:46Z) - Explainable Deep Few-shot Anomaly Detection with Deviation Networks [123.46611927225963]
We introduce a novel weakly-supervised anomaly detection framework to train detection models.
The proposed approach learns discriminative normality by leveraging the labeled anomalies and a prior probability.
Our model is substantially more sample-efficient and robust, and performs significantly better than state-of-the-art competing methods in both closed-set and open-set settings.
arXiv Detail & Related papers (2021-08-01T14:33:17Z) - Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit
Latent Features [8.407188666535506]
Most existing methods use an autoencoder to learn to reconstruct normal videos.
We propose an implicit two-path AE (ITAE), a structure in which two encoders implicitly model appearance and motion features.
For the complex distribution of normal scenes, we suggest normal density estimation of ITAE features.
NF models intensify ITAE performance by learning normality through implicitly learned features.
arXiv Detail & Related papers (2020-10-15T05:02:02Z) - Localizing Anomalies from Weakly-Labeled Videos [45.58643708315132]
We propose a WeaklySupervised Anomaly localization (WSAL) method focusing on temporally localizing anomalous segments within anomalous videos.
Inspired by the appearance difference in anomalous videos, the evolution of adjacent temporal segments is evaluated for the localization of anomalous segments.
Our proposed method achieves new state-of-the-art performance on the UCF-Crime and TAD datasets.
arXiv Detail & Related papers (2020-08-20T12:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.