Related papers: Autoregressive Denoising Score Matching is a Good Video Anomaly Detector

Autoregressive Denoising Score Matching is a Good Video Anomaly Detector

URL: http://arxiv.org/abs/2506.23282v1
Date: Sun, 29 Jun 2025 15:14:32 GMT
Title: Autoregressive Denoising Score Matching is a Good Video Anomaly Detector
Authors: Hanwen Zhang, Congqi Cao, Qinyi Lv, Lingtong Min, Yanning Zhang,
Abstract summary: Video anomaly detection (VAD) is an important computer vision problem.<n>We introduce a noise-conditioned score transformer for denoising score matching.<n>Then, we introduce a scene-dependent and motion-aware score function.<n>We integrate unaffected visual information via a novel autoregressive denoising score matching mechanism.
Score: 36.96911195723131
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video anomaly detection (VAD) is an important computer vision problem. Thanks to the mode coverage capabilities of generative models, the likelihood-based paradigm is catching growing interest, as it can model normal distribution and detect out-of-distribution anomalies. However, these likelihood-based methods are blind to the anomalies located in local modes near the learned distribution. To handle these ``unseen" anomalies, we dive into three gaps uniquely existing in VAD regarding scene, motion and appearance. Specifically, we first build a noise-conditioned score transformer for denoising score matching. Then, we introduce a scene-dependent and motion-aware score function by embedding the scene condition of input sequences into our model and assigning motion weights based on the difference between key frames of input sequences. Next, to solve the problem of blindness in principle, we integrate unaffected visual information via a novel autoregressive denoising score matching mechanism for inference. Through autoregressively injecting intensifying Gaussian noise into the denoised data and estimating the corresponding score function, we compare the denoised data with the original data to get a difference and aggregate it with the score function for an enhanced appearance perception and accumulate the abnormal context. With all three gaps considered, we can compute a more comprehensive anomaly indicator. Experiments on three popular VAD benchmarks demonstrate the state-of-the-art performance of our method.

Related papers

Video Anomaly Detection with Structured Keywords [0.0]
This paper focuses on detecting anomalies in surveillance video using keywords by leveraging foundational models' feature representation generalization capabilities.<n>We present a novel, lightweight pipeline for anomaly classification using keyword weights.<n>We achieve comparable performance on the three benchmarks Ped2, Shanghai Tech, and CUHK Avenue, with ROC AUC scores of 0.865, 0.745, and 0.742, respectively.
arXiv Detail & Related papers (2025-03-07T20:05:59Z)
Anomaly Detection via Autoencoder Composite Features and NCE [1.2891210250935148]
Autoencoders (AEs) or generative models are often employed to model the data distribution of normal inputs.<n>We propose a decoupled training approach for anomaly detection that both an AE and a likelihood model trained with noise contrastive estimation (NCE)
arXiv Detail & Related papers (2025-02-04T01:29:22Z)
Towards Zero-shot 3D Anomaly Localization [58.62650061201283]
3DzAL is a novel patch-level contrastive learning framework for 3D anomaly detection and localization.<n>We show that 3DzAL outperforms the state-of-the-art anomaly detection and localization performance.
arXiv Detail & Related papers (2024-12-05T16:25:27Z)
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection [15.72443573134312]
We treat feature vectors extracted from videos as realizations of a random variable with a fixed distribution. We train our video anomaly detector using a modification of denoising score matching. Our experiments on five popular video anomaly detection benchmarks demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2024-03-21T15:46:19Z)
Dynamic Addition of Noise in a Diffusion Model for Anomaly Detection [2.209921757303168]
Diffusion models have found valuable applications in anomaly detection by capturing the nominal data distribution and identifying anomalies via reconstruction. Despite their merits, they struggle to localize anomalies of varying scales, especially larger anomalies such as entire missing components. We present a novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach Meng et al. 2022 in three significant ways.
arXiv Detail & Related papers (2024-01-09T09:57:38Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal. Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos. This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z)
Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations [17.816344808780965]
unsupervised video anomaly detection (VAD) problem involves classifying each frame in a video as normal or abnormal, without any access to labels. To accomplish this, proposed method employs conditional diffusion models, where the input data is features extracted from pre-trained network. Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events.
arXiv Detail & Related papers (2023-07-04T07:36:48Z)
The role of noise in denoising models for anomaly detection in medical images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images. Unsupervised anomaly detection approaches have been proposed using only normal data for training. We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z)
Unsupervised Visual Defect Detection with Score-Based Generative Model [17.610722842950555]
We focus on the unsupervised visual defect detection and localization tasks. We propose a novel framework based on the recent score-based generative models. We evaluate our method on several datasets to demonstrate its effectiveness.
arXiv Detail & Related papers (2022-11-29T11:06:29Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.