SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model
- URL: http://arxiv.org/abs/2504.10320v1
- Date: Mon, 14 Apr 2025 15:30:03 GMT
- Title: SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model
- Authors: Zongcan Ding, Haodong Zhang, Peng Wu, Guansong Pang, Zhiwei Yang, Peng Wang, Yanning Zhang,
- Abstract summary: Video anomaly detection (VAD) aims to identify unexpected events in videos and has wide applications in safety-critical domains.<n> vision-language models (VLMs) have demonstrated strong multimodal reasoning capabilities, offering new opportunities for anomaly detection.<n>We propose SlowFastVAD, a hybrid framework that integrates a fast anomaly detector with a slow anomaly detector.
- Score: 52.47816604709358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anomaly detection (VAD) aims to identify unexpected events in videos and has wide applications in safety-critical domains. While semi-supervised methods trained on only normal samples have gained traction, they often suffer from high false alarm rates and poor interpretability. Recently, vision-language models (VLMs) have demonstrated strong multimodal reasoning capabilities, offering new opportunities for explainable anomaly detection. However, their high computational cost and lack of domain adaptation hinder real-time deployment and reliability. Inspired by dual complementary pathways in human visual perception, we propose SlowFastVAD, a hybrid framework that integrates a fast anomaly detector with a slow anomaly detector (namely a retrieval augmented generation (RAG) enhanced VLM), to address these limitations. Specifically, the fast detector first provides coarse anomaly confidence scores, and only a small subset of ambiguous segments, rather than the entire video, is further analyzed by the slower yet more interpretable VLM for elaborate detection and reasoning. Furthermore, to adapt VLMs to domain-specific VAD scenarios, we construct a knowledge base including normal patterns based on few normal samples and abnormal patterns inferred by VLMs. During inference, relevant patterns are retrieved and used to augment prompts for anomaly reasoning. Finally, we smoothly fuse the anomaly confidence of fast and slow detectors to enhance robustness of anomaly detection. Extensive experiments on four benchmarks demonstrate that SlowFastVAD effectively combines the strengths of both fast and slow detectors, and achieves remarkable detection accuracy and interpretability with significantly reduced computational overhead, making it well-suited for real-world VAD applications with high reliability requirements.
Related papers
- AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis [52.261173507177396]
We introduce AssistPDA, the first online video anomaly surveillance assistant (VAPDA) that unifies anomaly prediction, detection, and analysis (VAPDA) within a single framework.<n> AssistPDA enables real-time inference on streaming videos while supporting interactive user engagement.<n>We also introduce a novel event-level anomaly prediction task, enabling proactive anomaly forecasting before anomalies fully unfold.
arXiv Detail & Related papers (2025-03-27T18:30:47Z) - Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems [2.0179223501624786]
This paper presents TCVADS (Two-stage Cross-modal Video Anomaly Detection System), which leverages knowledge distillation and cross-modal contrastive learning.
Experimental results demonstrate that TCVADS significantly outperforms existing methods in model performance, detection efficiency, and interpretability.
arXiv Detail & Related papers (2024-12-28T16:24:35Z) - Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Beyond the Benchmark: Detecting Diverse Anomalies in Videos [0.6993026261767287]
Video Anomaly Detection (VAD) plays a crucial role in modern surveillance systems, aiming to identify various anomalies in real-world situations.
Current benchmark datasets predominantly emphasize simple, single-frame anomalies such as novel object detection.
We advocate for an expansion of VAD investigations to encompass intricate anomalies that extend beyond conventional benchmark boundaries.
arXiv Detail & Related papers (2023-10-03T09:22:06Z) - Real-Time Driver Monitoring Systems through Modality and View Analysis [28.18784311981388]
Driver distractions are known to be the dominant cause of road accidents.
State-of-the-art methods prioritize accuracy while ignoring latency.
We propose time-effective detection models by neglecting the temporal relation between video frames.
arXiv Detail & Related papers (2022-10-17T21:22:41Z) - FastAno: Fast Anomaly Detection via Spatio-temporal Patch Transformation [6.112591965159383]
We propose spatial rotation transformation (SRT) and temporal mixing transformation (TMT) to generate irregular patch cuboids within normal frame cuboids.
Our model is evaluated on three anomaly detection benchmarks, achieving competitive accuracy and surpassing all the previous works in terms of speed.
arXiv Detail & Related papers (2021-06-16T08:14:31Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.