MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
- URL: http://arxiv.org/abs/2403.14497v1
- Date: Thu, 21 Mar 2024 15:46:19 GMT
- Title: MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
- Authors: Jakub Micorek, Horst Possegger, Dominik Narnhofer, Horst Bischof, Mateusz Kozinski,
- Abstract summary: We treat feature vectors extracted from videos as realizations of a random variable with a fixed distribution.
We train our video anomaly detector using a modification of denoising score matching.
Our experiments on five popular video anomaly detection benchmarks demonstrate state-of-the-art performance.
- Score: 15.72443573134312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach to video anomaly detection: we treat feature vectors extracted from videos as realizations of a random variable with a fixed distribution and model this distribution with a neural network. This lets us estimate the likelihood of test videos and detect video anomalies by thresholding the likelihood estimates. We train our video anomaly detector using a modification of denoising score matching, a method that injects training data with noise to facilitate modeling its distribution. To eliminate hyperparameter selection, we model the distribution of noisy video features across a range of noise levels and introduce a regularizer that tends to align the models for different levels of noise. At test time, we combine anomaly indications at multiple noise scales with a Gaussian mixture model. Running our video anomaly detector induces minimal delays as inference requires merely extracting the features and forward-propagating them through a shallow neural network and a Gaussian mixture model. Our experiments on five popular video anomaly detection benchmarks demonstrate state-of-the-art performance, both in the object-centric and in the frame-centric setup.
Related papers
- Conditional score-based diffusion models for solving inverse problems in mechanics [6.319616423658121]
We propose a framework to perform Bayesian inference using conditional score-based diffusion models.
Conditional score-based diffusion models are generative models that learn to approximate the score function of a conditional distribution.
We demonstrate the efficacy of the proposed approach on a suite of high-dimensional inverse problems in mechanics.
arXiv Detail & Related papers (2024-06-19T02:09:15Z) - Diffusion Gaussian Mixture Audio Denoise [23.760755498636943]
We propose a DiffGMM model, a denoising model based on the diffusion and Gaussian mixture models.
Given a noisy audio signal, we first apply a 1D-U-Net to extract features and train linear layers to estimate parameters for the Gaussian mixture model.
The noisy signal is continuously subtracted from the estimated noise to output clean audio signals.
arXiv Detail & Related papers (2024-06-13T14:18:10Z) - Blue noise for diffusion models [50.99852321110366]
We introduce a novel and general class of diffusion models taking correlated noise within and across images into account.
Our framework allows introducing correlation across images within a single mini-batch to improve gradient flow.
We perform both qualitative and quantitative evaluations on a variety of datasets using our method.
arXiv Detail & Related papers (2024-02-07T14:59:25Z) - VideoFusion: Decomposed Diffusion Models for High-Quality Video
Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis.
Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z) - The role of noise in denoising models for anomaly detection in medical
images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images.
Unsupervised anomaly detection approaches have been proposed using only normal data for training.
We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z) - MANet: Improving Video Denoising with a Multi-Alignment Network [72.93429911044903]
We present a multi-alignment network, which generates multiple flow proposals followed by attention-based averaging.
Experiments on a large-scale video dataset demonstrate that our method improves the denoising baseline model by 0.2dB.
arXiv Detail & Related papers (2022-02-20T00:52:07Z) - Robust Unsupervised Multi-Object Tracking in Noisy Environments [5.409476600348953]
We introduce a robust unsupervised multi-object tracking (MOT) model: AttU-Net.
The proposed single-head attention model helps limit the negative impact of noise by learning visual representations at different segment scales.
We evaluate our method in the MNIST and the Atari game video benchmark.
arXiv Detail & Related papers (2021-05-20T19:38:03Z) - Speech Prediction in Silent Videos using Variational Autoencoders [29.423462898526605]
We present a model for generating speech in a silent video.
The proposed model combines recurrent neural networks and variational deep generative models to learn the auditory's conditional distribution.
We demonstrate the performance of our model on the GRID dataset based on standard benchmarks.
arXiv Detail & Related papers (2020-11-14T17:09:03Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior [63.11478060678794]
We propose an effective motion-excited sampler to obtain motion-aware noise prior.
By using the sparked prior in gradient estimation, we can successfully attack a variety of video classification models with fewer number of queries.
arXiv Detail & Related papers (2020-03-17T10:54:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.