Open-Vocabulary Video Anomaly Detection
- URL: http://arxiv.org/abs/2311.07042v3
- Date: Wed, 13 Mar 2024 10:57:00 GMT
- Title: Open-Vocabulary Video Anomaly Detection
- Authors: Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang,
Yanning Zhang
- Abstract summary: Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
- Score: 57.552523669351636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anomaly detection (VAD) with weak supervision has achieved remarkable
performance in utilizing video-level labels to discriminate whether a video
frame is normal or abnormal. However, current approaches are inherently limited
to a closed-set setting and may struggle in open-world applications where there
can be anomaly categories in the test data unseen during training. A few recent
studies attempt to tackle a more realistic setting, open-set VAD, which aims to
detect unseen anomalies given seen anomalies and normal videos. However, such a
setting focuses on predicting frame anomaly scores, having no ability to
recognize the specific categories of anomalies, despite the fact that this
ability is essential for building more informed video surveillance systems.
This paper takes a step further and explores open-vocabulary video anomaly
detection (OVVAD), in which we aim to leverage pre-trained large models to
detect and categorize seen and unseen anomalies. To this end, we propose a
model that decouples OVVAD into two mutually complementary tasks --
class-agnostic detection and class-specific classification -- and jointly
optimizes both tasks. Particularly, we devise a semantic knowledge injection
module to introduce semantic knowledge from large language models for the
detection task, and design a novel anomaly synthesis module to generate pseudo
unseen anomaly videos with the help of large vision generation models for the
classification task. These semantic knowledge and synthesis anomalies
substantially extend our model's capability in detecting and categorizing a
variety of seen and unseen anomalies. Extensive experiments on three
widely-used benchmarks demonstrate our model achieves state-of-the-art
performance on OVVAD task.
Related papers
- VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - A Coarse-to-Fine Pseudo-Labeling (C2FPL) Framework for Unsupervised
Video Anomaly Detection [4.494911384096143]
Detection of anomalous events in videos is an important problem in applications such as surveillance.
We propose a simple-but-effective two-stage pseudo-label generation framework that produces segment-level (normal/anomaly) pseudo-labels.
The proposed coarse-to-fine pseudo-label generator employs carefully-designed hierarchical divisive clustering and statistical hypothesis testing.
arXiv Detail & Related papers (2023-10-26T17:59:19Z) - A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection [14.089888316857426]
This paper focuses on weakly supervised video anomaly detection.
We develop a lightweight video anomaly detection model.
We show that our model can achieve comparable or even superior AUC score compared to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T01:23:08Z) - Object-centric and memory-guided normality reconstruction for video
anomaly detection [56.64792194894702]
This paper addresses anomaly detection problem for videosurveillance.
Due to the inherent rarity and heterogeneity of abnormal events, the problem is viewed as a normality modeling strategy.
Our model learns object-centric normal patterns without seeing anomalous samples during training.
arXiv Detail & Related papers (2022-03-07T19:28:39Z) - Approaches Toward Physical and General Video Anomaly Detection [0.0]
Anomaly detection in videos may enable automatic detection of malfunctions in many manufacturing, maintenance, and real-life settings.
We introduce the Physical Anomalous Trajectory or Motion dataset, which contains six different video classes.
We suggest an even harder benchmark where anomalous activities should be spotted on highly variable scenes.
arXiv Detail & Related papers (2021-12-14T18:57:44Z) - UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection [103.06327681038304]
We propose a supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection.
Unlike existing data sets, we introduce abnormal events annotated at the pixel level at training time.
We show that UBnormal can enhance the performance of a state-of-the-art anomaly detection framework.
arXiv Detail & Related papers (2021-11-16T17:28:46Z) - Explainable Deep Few-shot Anomaly Detection with Deviation Networks [123.46611927225963]
We introduce a novel weakly-supervised anomaly detection framework to train detection models.
The proposed approach learns discriminative normality by leveraging the labeled anomalies and a prior probability.
Our model is substantially more sample-efficient and robust, and performs significantly better than state-of-the-art competing methods in both closed-set and open-set settings.
arXiv Detail & Related papers (2021-08-01T14:33:17Z) - Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit
Latent Features [8.407188666535506]
Most existing methods use an autoencoder to learn to reconstruct normal videos.
We propose an implicit two-path AE (ITAE), a structure in which two encoders implicitly model appearance and motion features.
For the complex distribution of normal scenes, we suggest normal density estimation of ITAE features.
NF models intensify ITAE performance by learning normality through implicitly learned features.
arXiv Detail & Related papers (2020-10-15T05:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.