AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis
- URL: http://arxiv.org/abs/2503.21904v1
- Date: Thu, 27 Mar 2025 18:30:47 GMT
- Title: AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis
- Authors: Zhiwei Yang, Chen Gao, Jing Liu, Peng Wu, Guansong Pang, Mike Zheng Shou,
- Abstract summary: We introduce AssistPDA, the first online video anomaly surveillance assistant (VAPDA) that unifies anomaly prediction, detection, and analysis (VAPDA) within a single framework.<n> AssistPDA enables real-time inference on streaming videos while supporting interactive user engagement.<n>We also introduce a novel event-level anomaly prediction task, enabling proactive anomaly forecasting before anomalies fully unfold.
- Score: 52.261173507177396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancements in large language models (LLMs) have spurred growing interest in LLM-based video anomaly detection (VAD). However, existing approaches predominantly focus on video-level anomaly question answering or offline detection, ignoring the real-time nature essential for practical VAD applications. To bridge this gap and facilitate the practical deployment of LLM-based VAD, we introduce AssistPDA, the first online video anomaly surveillance assistant that unifies video anomaly prediction, detection, and analysis (VAPDA) within a single framework. AssistPDA enables real-time inference on streaming videos while supporting interactive user engagement. Notably, we introduce a novel event-level anomaly prediction task, enabling proactive anomaly forecasting before anomalies fully unfold. To enhance the ability to model intricate spatiotemporal relationships in anomaly events, we propose a Spatio-Temporal Relation Distillation (STRD) module. STRD transfers the long-term spatiotemporal modeling capabilities of vision-language models (VLMs) from offline settings to real-time scenarios. Thus it equips AssistPDA with a robust understanding of complex temporal dependencies and long-sequence memory. Additionally, we construct VAPDA-127K, the first large-scale benchmark designed for VLM-based online VAPDA. Extensive experiments demonstrate that AssistPDA outperforms existing offline VLM-based approaches, setting a new state-of-the-art for real-time VAPDA. Our dataset and code will be open-sourced to facilitate further research in the community.
Related papers
- SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model [52.47816604709358]
Video anomaly detection (VAD) aims to identify unexpected events in videos and has wide applications in safety-critical domains.
vision-language models (VLMs) have demonstrated strong multimodal reasoning capabilities, offering new opportunities for anomaly detection.
We propose SlowFastVAD, a hybrid framework that integrates a fast anomaly detector with a slow anomaly detector.
arXiv Detail & Related papers (2025-04-14T15:30:03Z) - Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM [35.06386971859359]
Holmes-VAD is a novel framework that leverages precise temporal supervision and rich multimodal instructions.
We construct the first large-scale multimodal VAD instruction-tuning benchmark, VAD-Instruct50k.
Building upon the VAD-Instruct50k dataset, we develop a customized solution for interpretable video anomaly detection.
arXiv Detail & Related papers (2024-06-18T03:19:24Z) - Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Video Anomaly Detection and Explanation via Large Language Models [34.52845566893497]
Video Anomaly Detection (VAD) aims to localize abnormal events on the timeline of long-range surveillance videos.
In this paper, we conduct pioneer research on equipping video-based large language models (VLLMs) in the framework of VAD.
We introduce a novel network module Long-Term Context (LTC) to mitigate the incapability of VLLMs in long-range context modeling.
arXiv Detail & Related papers (2024-01-11T07:09:44Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - Online Anomaly Detection over Live Social Video Streaming [17.73632683825434]
Social video anomaly detection plays a critical role in applications from e-commerce to e-learning.
Traditionally, anomaly detection techniques are applied to find anomalies in video broadcasting.
We propose a generic framework for effectively online detecting Anomalies Over social Video LIve Streaming.
arXiv Detail & Related papers (2023-12-01T23:30:45Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised
Video Anomaly Detection [3.146076597280736]
Video anomaly detection (VAD) is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video.
We first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique.
Our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on three commonly-used benchmark datasets in the VAD problem.
arXiv Detail & Related papers (2022-12-09T22:28:24Z) - Adversarial Imitation Learning from Video using a State Observer [50.45370139579214]
We introduce a new algorithm called Visual Generative Adversarial Imitation from Observation using a State Observer VGAIfO-SO.
At its core, VGAIfO-SO seeks to address sample inefficiency using a novel, self-supervised state observer.
We show experimentally in several continuous control environments that VGAIfO-SO is more sample efficient than other IfO algorithms at learning from video-only demonstrations.
arXiv Detail & Related papers (2022-02-01T06:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.