Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection
- URL: http://arxiv.org/abs/2505.15205v2
- Date: Fri, 23 May 2025 00:54:21 GMT
- Title: Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection
- Authors: Hyogun Lee, Haksub Kim, Ig-Jae Kim, Yonghun Choi,
- Abstract summary: Flashback is a zero-shot and real-time video anomaly detection paradigm.<n>Inspired by the human cognitive mechanism of instantly judging anomalies, Flashback operates in two stages: Recall and Respond.<n>By eliminating all LLM calls at inference time, Flashback delivers real-time VAD even on a consumer-grade GPU.
- Score: 11.197888893266535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video Anomaly Detection (VAD) automatically identifies anomalous events from video, mitigating the need for human operators in large-scale surveillance deployments. However, two fundamental obstacles hinder real-world adoption: domain dependency and real-time constraints -- requiring near-instantaneous processing of incoming video. To this end, we propose Flashback, a zero-shot and real-time video anomaly detection paradigm. Inspired by the human cognitive mechanism of instantly judging anomalies and reasoning in current scenes based on past experience, Flashback operates in two stages: Recall and Respond. In the offline recall stage, an off-the-shelf LLM builds a pseudo-scene memory of both normal and anomalous captions without any reliance on real anomaly data. In the online respond stage, incoming video segments are embedded and matched against this memory via similarity search. By eliminating all LLM calls at inference time, Flashback delivers real-time VAD even on a consumer-grade GPU. On two large datasets from real-world surveillance scenarios, UCF-Crime and XD-Violence, we achieve 87.3 AUC (+7.0 pp) and 75.1 AP (+13.1 pp), respectively, outperforming prior zero-shot VAD methods by large margins.
Related papers
- No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection [15.949619310702579]
Existing video anomaly detection methods under perform in open-world scenarios.<n>Key contributing factors include limited dataset diversity, and inadequate understanding of context-dependent anomalous semantics.<n>We propose LAVIDA, an end-to-end zero-shot video anomaly detection framework.
arXiv Detail & Related papers (2026-02-22T16:03:43Z) - TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection [0.0]
This paper addresses the context-aware zero-shot anomaly detection challenge.<n>Our approach defines a memory-augmented pipeline, correlating temporal signals with visual embeddings.<n>We achieve 90.4% AUC on UCF-Crime and 83.67% AP on XD-Violence, a new state-of-the-art among zero-shot models.
arXiv Detail & Related papers (2025-11-01T14:54:08Z) - Evaluation of Vision-LLMs in Surveillance Video [8.750453732584491]
This paper investigates the spatial reasoning of vision-language models (VLMs)<n>It addresses the embodied perception challenge of interpreting dynamic 3D scenes from sparse 2D video.<n>We evaluate four open models on UCF-Crime and RWF-2000 under prompting and privacy-preserving conditions.
arXiv Detail & Related papers (2025-10-27T10:27:02Z) - Cerberus: Real-Time Video Anomaly Detection via Cascaded Vision-Language Models [20.102770709407437]
Cerberus is a two-stage cascaded system designed for efficient yet accurate real-time VAD.<n>It learns normal behavioral rules offline, and combines lightweight filtering with fine-grained VLM reasoning during online inference.<n>Cerberus on average achieves 57.68 fps on an NVIDIA L40S GPU, a 151.79$times$ speedup, and 97.2% accuracy comparable to the state-of-the-art VLM-based VAD methods.
arXiv Detail & Related papers (2025-10-18T01:27:23Z) - EventVAD: Training-Free Event-Aware Video Anomaly Detection [19.714436150837148]
EventVAD is an event-aware video anomaly detection framework.<n>It combines tailored dynamic graph architectures and multimodal-event reasoning.<n>It achieves state-of-the-art (SOTA) in training-free settings, outperforming strong baselines that use 7B or larger MLLMs.
arXiv Detail & Related papers (2025-04-17T16:59:04Z) - AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis [52.261173507177396]
We introduce AssistPDA, the first online video anomaly surveillance assistant (VAPDA) that unifies anomaly prediction, detection, and analysis (VAPDA) within a single framework.<n> AssistPDA enables real-time inference on streaming videos while supporting interactive user engagement.<n>We also introduce a novel event-level anomaly prediction task, enabling proactive anomaly forecasting before anomalies fully unfold.
arXiv Detail & Related papers (2025-03-27T18:30:47Z) - StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition [19.54521322177521]
We introduce StreamMind, a video LLM framework that achieves ultra-FPS streaming video processing (100 fps on a single A100)<n>We propose a novel perception-cognition intertemporal paradigm named ''event-gated LLM invocation''<n> Experiments on Ego4D and SoccerNet streaming tasks, as well as standard offline benchmarks, demonstrate state-of-the-art performance in both model capability and real-time efficiency.
arXiv Detail & Related papers (2025-03-08T13:44:38Z) - Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.<n>Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.<n>We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - Memory-augmented Online Video Anomaly Detection [2.269915940890348]
This paper presents a system capable to work in an online fashion, exploiting only the videos captured by a dash-mounted camera.
Movad is able to reach an AUC score of 82.17%, surpassing the current state-of-the-art by +2.87 AUC.
arXiv Detail & Related papers (2023-02-21T15:14:27Z) - Anomaly detection in surveillance videos using transformer based
attention model [3.2968779106235586]
This research suggests using a weakly supervised strategy to avoid annotating anomalous segments in training videos.
The proposed framework is validated on real-world dataset i.e. ShanghaiTech Campus dataset.
arXiv Detail & Related papers (2022-06-03T12:19:39Z) - Convolutional Transformer based Dual Discriminator Generative
Adversarial Networks for Video Anomaly Detection [27.433162897608543]
We propose Conversaal Transformer based Dual Discriminator Generative Adrial Networks (CT-D2GAN) to perform unsupervised video anomaly detection.
It contains three key components, i., a convolutional encoder to capture the spatial information of input clips, a temporal self-attention module to encode the temporal dynamics and predict the future frame.
arXiv Detail & Related papers (2021-07-29T03:07:25Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Multiple Instance-Based Video Anomaly Detection using Deep Temporal
Encoding-Decoding [5.255783459833821]
We propose a weakly supervised deep temporal encoding-decoding solution for anomaly detection in surveillance videos.
The proposed approach uses both abnormal and normal video clips during the training phase.
The results show that the proposed method performs similar to or better than the state-of-the-art solutions for anomaly detection in video surveillance applications.
arXiv Detail & Related papers (2020-07-03T08:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.