Forward Consistency Learning with Gated Context Aggregation for Video Anomaly Detection
- URL: http://arxiv.org/abs/2601.18135v1
- Date: Mon, 26 Jan 2026 04:35:31 GMT
- Title: Forward Consistency Learning with Gated Context Aggregation for Video Anomaly Detection
- Authors: Jiahao Lyu, Minghua Zhao, Xuewen Huang, Yifei Chen, Shuangli Du, Jing Hu, Cheng Shi, Zhiyong Lv,
- Abstract summary: Video anomaly detection (VAD) aims to measure deviations from normal patterns for various events in real-time surveillance systems.<n>Most existing VAD methods rely on large-scale models to pursue extreme accuracy, limiting their feasibility on resource-limited edge devices.<n>We introduce FoGA, a lightweight VAD model that performs Forward consistency learning with Gated context aggregation.
- Score: 17.79982215633934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a crucial element of public security, video anomaly detection (VAD) aims to measure deviations from normal patterns for various events in real-time surveillance systems. However, most existing VAD methods rely on large-scale models to pursue extreme accuracy, limiting their feasibility on resource-limited edge devices. Moreover, mainstream prediction-based VAD detects anomalies using only single-frame future prediction errors, overlooking the richer constraints from longer-term temporal forward information. In this paper, we introduce FoGA, a lightweight VAD model that performs Forward consistency learning with Gated context Aggregation, containing about 2M parameters and tailored for potential edge devices. Specifically, we propose a Unet-based method that performs feature extraction on consecutive frames to generate both immediate and forward predictions. Then, we introduce a gated context aggregation module into the skip connections to dynamically fuse encoder and decoder features at the same spatial scale. Finally, the model is jointly optimized with a novel forward consistency loss, and a hybrid anomaly measurement strategy is adopted to integrate errors from both immediate and forward frames for more accurate detection. Extensive experiments demonstrate the effectiveness of the proposed method, which substantially outperforms state-of-the-art competing methods, running up to 155 FPS. Hence, our FoGA achieves an excellent trade-off between performance and the efficiency metric.
Related papers
- Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection [52.5174167737992]
Video anomaly detection (VAD) aims to identify abnormal events in videos.<n>We propose SteerVAD, which advances MLLM-based VAD by shifting from passively reading to actively steering and rectifying internal representations.<n>Our method achieves state-of-the-art performance among tuning-free approaches requiring only 1% of training data.
arXiv Detail & Related papers (2026-02-27T13:48:50Z) - Real-Time Proactive Anomaly Detection via Forward and Backward Forecast Modeling [0.0]
We introduce two proactive anomaly detection frameworks: the Forward Forecasting Model (FFM) and the Backward Reconstruction Model (BRM)<n>FFM forecasts future sequences to anticipate disruptions, while BRM reconstructs recent history from future context to uncover early precursors.<n>Our models support both continuous and discrete multivariate features, enabling robust performance in real-world settings.
arXiv Detail & Related papers (2026-02-12T03:57:41Z) - SynCast: Synergizing Contradictions in Precipitation Nowcasting via Diffusion Sequential Preference Optimization [62.958457694151384]
We introduce preference optimization into precipitation nowcasting for the first time, motivated by the success of reinforcement learning from human feedback in large language models.<n>In the first stage, the framework focuses on reducing FAR, training the model to effectively suppress false alarms.
arXiv Detail & Related papers (2025-10-22T16:11:22Z) - Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching [14.503330877000758]
Time-Conditioned Contraction Matching is a novel method for semi-supervised anomaly detection in tabular data.<n>It is inspired by flow matching, a recent generative modeling framework that learns velocity fields between probability distributions.<n>Extensive experiments on the ADBench benchmark show that TCCM strikes a favorable balance between detection accuracy and inference cost.
arXiv Detail & Related papers (2025-10-21T06:26:38Z) - ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z) - Anticipatory Fall Detection in Humans with Hybrid Directed Graph Neural Networks and Long Short-Term Memory [12.677218248209494]
We propose a hybrid model combining Dynamic Graph Neural Networks (DGNN) with Long Short-Term Memory (LSTM) networks to anticipate falls.<n>Our approach employs real-time skeletal features extracted from video sequences as input for the proposed model.<n>The LSTM-based network then predicts human movement in subsequent time steps, enabling early detection of falls.
arXiv Detail & Related papers (2025-09-01T12:56:31Z) - ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting [7.401111319849394]
ForeSight is a novel joint detection and forecasting framework for vision-based 3D perception in autonomous vehicles.<n>We show that ForeSight achieves state-of-the-art performance, achieving an EPA of 54.9%, surpassing previous methods by 9.3%.
arXiv Detail & Related papers (2025-08-09T20:18:10Z) - SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model [52.47816604709358]
Video anomaly detection (VAD) aims to identify unexpected events in videos and has wide applications in safety-critical domains.<n> vision-language models (VLMs) have demonstrated strong multimodal reasoning capabilities, offering new opportunities for anomaly detection.<n>We propose SlowFastVAD, a hybrid framework that integrates a fast anomaly detector with a slow anomaly detector.
arXiv Detail & Related papers (2025-04-14T15:30:03Z) - Interactive Test-Time Adaptation with Reliable Spatial-Temporal Voxels for Multi-Modal Segmentation [56.70910056845503]
Multi-modal test-time adaptation (MM-TTA) adapts models to an unlabeled target domain by leveraging the complementary multi-modal inputs in an online manner.<n>Previous MM-TTA methods for 3D segmentation suffer from two major limitations: 1) unstable frame-wise predictions caused by temporal inconsistency, and 2) consistently incorrect predictions that violate the assumption of reliable modality guidance.<n>This work introduces a comprehensive two-fold framework: Latte++ that better suppresses the unstable frame-wise predictions with more informative geometric correspondences, and Interactive Test-Time Adaptation (ITTA), a flexible add-on to empower effortless human feedback
arXiv Detail & Related papers (2024-03-11T06:56:08Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.