Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection
- URL: http://arxiv.org/abs/2312.01764v1
- Date: Mon, 4 Dec 2023 09:40:11 GMT
- Title: Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection
- Authors: Chen Zhang, Guorong Li, Yuankai Qi, Hanhua Ye, Laiyun Qing, Ming-Hsuan
Yang, Qingming Huang
- Abstract summary: We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
- Score: 103.92970668001277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of weakly supervised video anomaly detection is to learn a detection
model using only video-level labeled data. However, prior studies typically
divide videos into fixed-length segments without considering the complexity or
duration of anomalies. Moreover, these studies usually just detect the most
abnormal segments, potentially overlooking the completeness of anomalies. To
address these limitations, we propose a Dynamic Erasing Network (DE-Net) for
weakly supervised video anomaly detection, which learns multi-scale temporal
features. Specifically, to handle duration variations of abnormal events, we
first propose a multi-scale temporal modeling module, capable of extracting
features from segments of varying lengths and capturing both local and global
visual information across different temporal scales. Then, we design a dynamic
erasing strategy, which dynamically assesses the completeness of the detected
anomalies and erases prominent abnormal segments in order to encourage the
model to discover gentle abnormal segments in a video. The proposed method
obtains favorable performance compared to several state-of-the-art approaches
on three datasets: XD-Violence, TAD, and UCF-Crime. Code will be made available
at https://github.com/ArielZc/DE-Net.
Related papers
- Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection [7.127829790714167]
Skeleton-based video anomaly detection (SVAD) is a crucial task in computer vision.
This paper introduces a novel, practical and lightweight framework, namely Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection (GiCiSAD)
experiments on four widely used skeleton-based video datasets show that GiCiSAD outperforms existing methods with significantly fewer training parameters.
arXiv Detail & Related papers (2024-03-18T18:42:32Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Holistic Representation Learning for Multitask Trajectory Anomaly
Detection [65.72942351514956]
We propose a holistic representation of skeleton trajectories to learn expected motions across segments at different times.
We encode temporally occluded trajectories, jointly learn latent representations of the occluded segments, and reconstruct trajectories based on expected motions across different temporal segments.
arXiv Detail & Related papers (2023-11-03T11:32:53Z) - Spatio-temporal predictive tasks for abnormal event detection in videos [60.02503434201552]
We propose new constrained pretext tasks to learn object level normality patterns.
Our approach consists in learning a mapping between down-scaled visual queries and their corresponding normal appearance and motion characteristics.
Experiments on several benchmark datasets demonstrate the effectiveness of our approach to localize and track anomalies.
arXiv Detail & Related papers (2022-10-27T19:45:12Z) - Adaptive graph convolutional networks for weakly supervised anomaly
detection in videos [42.3118758940767]
We propose a weakly supervised adaptive graph convolutional network (WAGCN) to model the contextual relationships among video segments.
We fully consider the influence of other video segments on the current segment when generating the anomaly probability score for each segment.
arXiv Detail & Related papers (2022-02-14T06:31:34Z) - Weakly Supervised Video Anomaly Detection via Center-guided
Discriminative Learning [25.787860059872106]
Anomaly detection in surveillance videos is a challenging task due to the diversity of anomalous video content and duration.
We propose an anomaly detection framework, called Anomaly Regression Net (AR-Net), which only requires video-level labels in training stage.
Our method yields a new state-of-the-art result for video anomaly detection on ShanghaiTech dataset.
arXiv Detail & Related papers (2021-04-15T06:41:23Z) - Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit
Latent Features [8.407188666535506]
Most existing methods use an autoencoder to learn to reconstruct normal videos.
We propose an implicit two-path AE (ITAE), a structure in which two encoders implicitly model appearance and motion features.
For the complex distribution of normal scenes, we suggest normal density estimation of ITAE features.
NF models intensify ITAE performance by learning normality through implicitly learned features.
arXiv Detail & Related papers (2020-10-15T05:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.