RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection
- URL: http://arxiv.org/abs/2511.13204v1
- Date: Mon, 17 Nov 2025 10:15:34 GMT
- Title: RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection
- Authors: Junhee Lee, ChaeBeen Bang, MyoungChul Kim, MyeongAh Cho,
- Abstract summary: Weakly-Supervised Video Anomaly Detection aims to identify anomalous events using only video-level labels.<n>Existing methods often oversimplify the anomaly space by treating all abnormal events as a single category.<n>We propose RefineVAD, a novel framework that mimics this dual-process reasoning.
- Score: 2.770730728142587
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly-Supervised Video Anomaly Detection aims to identify anomalous events using only video-level labels, balancing annotation efficiency with practical applicability. However, existing methods often oversimplify the anomaly space by treating all abnormal events as a single category, overlooking the diverse semantic and temporal characteristics intrinsic to real-world anomalies. Inspired by how humans perceive anomalies, by jointly interpreting temporal motion patterns and semantic structures underlying different anomaly types, we propose RefineVAD, a novel framework that mimics this dual-process reasoning. Our framework integrates two core modules. The first, Motion-aware Temporal Attention and Recalibration (MoTAR), estimates motion salience and dynamically adjusts temporal focus via shift-based attention and global Transformer-based modeling. The second, Category-Oriented Refinement (CORE), injects soft anomaly category priors into the representation space by aligning segment-level features with learnable category prototypes through cross-attention. By jointly leveraging temporal dynamics and semantic structure, explicitly models both "how" motion evolves and "what" semantic category it resembles. Extensive experiments on WVAD benchmark validate the effectiveness of RefineVAD and highlight the importance of integrating semantic context to guide feature refinement toward anomaly-relevant patterns.
Related papers
- Weakly Supervised Video Anomaly Detection with Anomaly-Connected Components and Intention Reasoning [23.043341269626016]
We propose a novel framework named LAS-VAD, short for Learning Anomaly Semantics for WS-VAD.<n>Our framework integrates anomaly-connected component mechanism and intention awareness mechanism.<n>It outperforms current state-of-the-art methods with remarkable gains.
arXiv Detail & Related papers (2026-02-28T08:57:33Z) - Learning to Tell Apart: Weakly Supervised Video Anomaly Detection via Disentangled Semantic Alignment [47.507511439028754]
We propose a novel Disentangled Semantic Alignment Network (DSANet) to explicitly separate abnormal and normal features from coarse-grained and fine-grained aspects.<n>At the coarse-grained level, we introduce a self-guided normality modeling branch that reconstructs input video features under the guidance of learned normal prototypes.<n>At the fine-grained level, we present a decoupled contrastive semantic alignment mechanism, which first temporally decomposes each video into event-centric and background-centric components.
arXiv Detail & Related papers (2025-11-13T14:06:48Z) - DAMS:Dual-Branch Adaptive Multiscale Spatiotemporal Framework for Video Anomaly Detection [7.117824587276951]
This study offers a dual-path architecture called the Dual-Branch Adaptive Multiscale Stemporal Framework (DAMS), which is based on multilevel feature and decoupling fusion.<n>The main processing path integrates the Adaptive Multiscale Time Pyramid Network (AMTPN) with the Convolutional Block Attention Mechanism (CBAM)
arXiv Detail & Related papers (2025-07-28T08:42:00Z) - Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection [12.100563798908777]
Video Anomaly Detection (VAD) is essential for computer vision research.<n>Existing VAD methods utilize either reconstruction-based or prediction-based frameworks.<n>We address pose-based video anomaly detection and introduce a novel framework called Dual Conditioned Motion Diffusion.
arXiv Detail & Related papers (2024-12-23T01:31:39Z) - Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image.
In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting.
Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Fine-grained Temporal Contrastive Learning for Weakly-supervised
Temporal Action Localization [87.47977407022492]
This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in weakly-supervised action localization.
Under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting.
Our method achieves state-of-the-art performance on two popular benchmarks.
arXiv Detail & Related papers (2022-03-31T05:13:50Z) - Object-centric and memory-guided normality reconstruction for video
anomaly detection [56.64792194894702]
This paper addresses anomaly detection problem for videosurveillance.
Due to the inherent rarity and heterogeneity of abnormal events, the problem is viewed as a normality modeling strategy.
Our model learns object-centric normal patterns without seeing anomalous samples during training.
arXiv Detail & Related papers (2022-03-07T19:28:39Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.