Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection
- URL: http://arxiv.org/abs/2412.03044v2
- Date: Thu, 27 Mar 2025 05:03:14 GMT
- Title: Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection
- Authors: Xiaofeng Tan, Hongsong Wang, Xin Geng, Liang Wang,
- Abstract summary: Video anomaly detection (VAD) is a vital yet complex open-set task in computer vision.<n>We introduce a novel frequency-guided diffusion model with perturbation training.<n>We employ the 2D Discrete Cosine Transform (DCT) to separate high-frequency (local) and low-frequency (global) motion components.
- Score: 43.49146665908238
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video anomaly detection (VAD) is a vital yet complex open-set task in computer vision, commonly tackled through reconstruction-based methods. However, these methods struggle with two key limitations: (1) insufficient robustness in open-set scenarios, where unseen normal motions are frequently misclassified as anomalies, and (2) an overemphasis on, but restricted capacity for, local motion reconstruction, which are inherently difficult to capture accurately due to their diversity. To overcome these challenges, we introduce a novel frequency-guided diffusion model with perturbation training. First, we enhance robustness by training a generator to produce perturbed samples, which are similar to normal samples and target the weakness of the reconstruction model. This training paradigm expands the reconstruction domain of the model, improving its generalization to unseen normal motions. Second, to address the overemphasis on motion details, we employ the 2D Discrete Cosine Transform (DCT) to separate high-frequency (local) and low-frequency (global) motion components. By guiding the diffusion model with observed high-frequency information, we prioritize the reconstruction of low-frequency components, enabling more accurate and robust anomaly detection. Extensive experiments on five widely used VAD datasets demonstrate that our approach surpasses state-of-the-art methods, underscoring its effectiveness in open-set scenarios and diverse motion contexts. Our project website is https://xiaofeng-tan.github.io/projects/FG-Diff/index.html.
Related papers
- One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.
To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.
Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection [12.100563798908777]
Video Anomaly Detection (VAD) is essential for computer vision research.
Existing VAD methods utilize either reconstruction-based or prediction-based frameworks.
We address pose-based video anomaly detection and introduce a novel framework called Dual Conditioned Motion Diffusion.
arXiv Detail & Related papers (2024-12-23T01:31:39Z) - Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking [52.393359791978035]
Motion2VecSets is a 4D diffusion model for dynamic surface reconstruction from point cloud sequences.
We parameterize 4D dynamics with latent sets instead of using global latent codes.
For more temporally-coherent object tracking, we synchronously denoise deformation latent sets and exchange information across multiple frames.
arXiv Detail & Related papers (2024-01-12T15:05:08Z) - Dynamic Addition of Noise in a Diffusion Model for Anomaly Detection [2.209921757303168]
Diffusion models have found valuable applications in anomaly detection by capturing the nominal data distribution and identifying anomalies via reconstruction.
Despite their merits, they struggle to localize anomalies of varying scales, especially larger anomalies such as entire missing components.
We present a novel framework that enhances the capability of diffusion models, by extending the previous introduced implicit conditioning approach Meng et al.
2022 in three significant ways.
arXiv Detail & Related papers (2024-01-09T09:57:38Z) - F3-Pruning: A Training-Free and Generalized Pruning Strategy towards
Faster and Finer Text-to-Video Synthesis [94.10861578387443]
We explore the inference process of two mainstream T2V models using transformers and diffusion models.
We propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights.
Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning.
arXiv Detail & Related papers (2023-12-06T12:34:47Z) - Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image.
In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting.
Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z) - Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image
Denoising [16.43285056788183]
We propose a novel approach called the Reconstruct-and-Generate Diffusion Model (RnG)
Our method leverages a reconstructive denoising network to recover the majority of the underlying clean signal.
It employs a diffusion algorithm to generate residual high-frequency details, thereby enhancing visual quality.
arXiv Detail & Related papers (2023-09-19T16:01:20Z) - DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and
Highlight Detection [38.12212015133935]
A novel framework, DiffusionVMR, is proposed to redefine the two tasks as a unified conditional denoising generation process.
Experiments conducted on five widely-used benchmarks demonstrate the effectiveness and flexibility of the proposed DiffusionVMR.
arXiv Detail & Related papers (2023-08-29T08:20:23Z) - Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction [75.91471250967703]
We introduce a novel sampling framework called Steerable Conditional Diffusion.
This framework adapts the diffusion model, concurrently with image reconstruction, based solely on the information provided by the available measurement.
We achieve substantial enhancements in out-of-distribution performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z) - Exploring Diffusion Models for Unsupervised Video Anomaly Detection [17.816344808780965]
This paper investigates the performance of diffusion models for video anomaly detection (VAD)
Experiments performed on two large-scale anomaly detection datasets demonstrate the consistent improvement of the proposed method over the state-of-the-art generative models.
This is the first study using a diffusion model to present guidance for examining VAD in surveillance scenarios.
arXiv Detail & Related papers (2023-04-12T13:16:07Z) - DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly
Detection [89.49600182243306]
We reformulate the reconstruction process using a diffusion model into a noise-to-norm paradigm.
We propose a rapid one-step denoising paradigm, significantly faster than the traditional iterative denoising in diffusion models.
The segmentation sub-network predicts pixel-level anomaly scores using the input image and its anomaly-free restoration.
arXiv Detail & Related papers (2023-03-15T16:14:06Z) - Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.
Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z) - Regularity Learning via Explicit Distribution Modeling for Skeletal
Video Anomaly Detection [43.004613173363566]
A novel Motion Embedder (ME) is proposed to provide a pose motion representation from the probability perspective.
A novel task-specific Spatial-Temporal Transformer (STT) is deployed for self-supervised pose sequence reconstruction.
MoPRL achieves the state-of-the-art performance by an average improvement of 4.7% AUC on several challenging datasets.
arXiv Detail & Related papers (2021-12-07T11:52:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.