Unsupervised Video Anomaly Detection with Diffusion Models Conditioned
on Compact Motion Representations
- URL: http://arxiv.org/abs/2307.01533v2
- Date: Wed, 19 Jul 2023 06:39:36 GMT
- Title: Unsupervised Video Anomaly Detection with Diffusion Models Conditioned
on Compact Motion Representations
- Authors: Anil Osman Tur and Nicola Dall'Asen and Cigdem Beyan and Elisa Ricci
- Abstract summary: unsupervised video anomaly detection (VAD) problem involves classifying each frame in a video as normal or abnormal, without any access to labels.
To accomplish this, proposed method employs conditional diffusion models, where the input data is features extracted from pre-trained network.
Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events.
- Score: 17.816344808780965
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper aims to address the unsupervised video anomaly detection (VAD)
problem, which involves classifying each frame in a video as normal or
abnormal, without any access to labels. To accomplish this, the proposed method
employs conditional diffusion models, where the input data is the
spatiotemporal features extracted from a pre-trained network, and the condition
is the features extracted from compact motion representations that summarize a
given video segment in terms of its motion and appearance. Our method utilizes
a data-driven threshold and considers a high reconstruction error as an
indicator of anomalous events. This study is the first to utilize compact
motion representations for VAD and the experiments conducted on two large-scale
VAD benchmarks demonstrate that they supply relevant information to the
diffusion model, and consequently improve VAD performances w.r.t the prior art.
Importantly, our method exhibits better generalization performance across
different datasets, notably outperforming both the state-of-the-art and
baseline methods. The code of our method is available at
https://github.com/AnilOsmanTur/conditioned_video_anomaly_diffusion
Related papers
- Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection [14.315287192621662]
Video anomaly detection (VAD) often learns the distribution of normal samples and detects the anomaly through measuring significant deviations.
Most VADs cannot cope with cross-dataset validation for new target domains.
We propose a novel VAD method with a motion-guided memory module to achieve cross-dataset validation with zero-shot.
arXiv Detail & Related papers (2024-09-26T07:48:20Z) - Don't Judge by the Look: Towards Motion Coherent Video Representation [56.09346222721583]
Motion Coherent Augmentation (MCA) is a data augmentation method for video understanding.
MCA introduces appearance variation in videos and implicitly encourages the model to prioritize motion patterns, rather than static appearances.
arXiv Detail & Related papers (2024-03-14T15:53:04Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection [14.089888316857426]
This paper focuses on weakly supervised video anomaly detection.
We develop a lightweight video anomaly detection model.
We show that our model can achieve comparable or even superior AUC score compared to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T01:23:08Z) - Exploring Diffusion Models for Unsupervised Video Anomaly Detection [17.816344808780965]
This paper investigates the performance of diffusion models for video anomaly detection (VAD)
Experiments performed on two large-scale anomaly detection datasets demonstrate the consistent improvement of the proposed method over the state-of-the-art generative models.
This is the first study using a diffusion model to present guidance for examining VAD in surveillance scenarios.
arXiv Detail & Related papers (2023-04-12T13:16:07Z) - Dual Memory Units with Uncertainty Regulation for Weakly Supervised
Video Anomaly Detection [15.991784541576788]
Existing approaches, both video and segment-level label oriented, mainly focus on extracting representations for anomaly data.
We propose an Uncertainty Regulated Dual Memory Units (UR-DMU) model to learn both the representations of normal data and discriminative features of abnormal data.
Our method outperforms the state-of-the-art methods by a sizable margin.
arXiv Detail & Related papers (2023-02-10T10:39:40Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit
Latent Features [8.407188666535506]
Most existing methods use an autoencoder to learn to reconstruct normal videos.
We propose an implicit two-path AE (ITAE), a structure in which two encoders implicitly model appearance and motion features.
For the complex distribution of normal scenes, we suggest normal density estimation of ITAE features.
NF models intensify ITAE performance by learning normality through implicitly learned features.
arXiv Detail & Related papers (2020-10-15T05:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.