Related papers: Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling

Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling

URL: http://arxiv.org/abs/2508.18463v2
Date: Wed, 27 Aug 2025 09:43:57 GMT
Title: Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling
Authors: Md. Rashid Shahriar Khan, Md. Abrar Hasan, Mohammod Tareq Aziz Justice,
Abstract summary: This work introduces a context-aware zero-shot anomaly detection framework that identifies abnormal events without exposure to anomaly examples during training.<n>The proposed hybrid architecture combines TimeSformer, DPC, and CLIP to extract rich spatial-temporal features.<n>A context-gating mechanism further enhances decision-making by modulating predictions with scene-aware cues or global video features.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Detecting anomalies in surveillance footage is inherently challenging due to their unpredictable and context-dependent nature. This work introduces a novel context-aware zero-shot anomaly detection framework that identifies abnormal events without exposure to anomaly examples during training. The proposed hybrid architecture combines TimeSformer, DPC, and CLIP to model spatiotemporal dynamics and semantic context. TimeSformer serves as the vision backbone to extract rich spatial-temporal features, while DPC forecasts future representations to identify temporal deviations. Furthermore, a CLIP-based semantic stream enables concept-level anomaly detection through context-specific text prompts. These components are jointly trained using InfoNCE and CPC losses, aligning visual inputs with their temporal and semantic representations. A context-gating mechanism further enhances decision-making by modulating predictions with scene-aware cues or global video features. By integrating predictive modeling with vision-language understanding, the system can generalize to previously unseen behaviors in complex environments. This framework bridges the gap between temporal reasoning and semantic context in zero-shot anomaly detection for surveillance. The code for this research has been made available at https://github.com/NK-II/Context-Aware-Zero-Shot-Anomaly-Detection-in-Surveillance.

Related papers

Weakly Supervised Video Anomaly Detection with Anomaly-Connected Components and Intention Reasoning [23.043341269626016]
We propose a novel framework named LAS-VAD, short for Learning Anomaly Semantics for WS-VAD.<n>Our framework integrates anomaly-connected component mechanism and intention awareness mechanism.<n>It outperforms current state-of-the-art methods with remarkable gains.
arXiv Detail & Related papers (2026-02-28T08:57:33Z)
A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis [64.42659342276117]
Most video-anomaly research stops at frame-wise detection, offering little insight into why an event is abnormal.<n>Recent video anomaly localization and video anomaly understanding methods improve explainability but remain data-dependent and task-specific.<n>We propose a unified reasoning framework that bridges the gap between temporal detection, spatial localization, and textual explanation.
arXiv Detail & Related papers (2025-11-02T14:49:08Z)
TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection [0.0]
This paper addresses the context-aware zero-shot anomaly detection challenge.<n>Our approach defines a memory-augmented pipeline, correlating temporal signals with visual embeddings.<n>We achieve 90.4% AUC on UCF-Crime and 83.67% AP on XD-Violence, a new state-of-the-art among zero-shot models.
arXiv Detail & Related papers (2025-11-01T14:54:08Z)
Context-aware TFL: A Universal Context-aware Contrastive Learning Framework for Temporal Forgery Localization [60.73623588349311]
We propose a universal context-aware contrastive learning framework (UniCaCLF) for temporal forgery localization.<n>Our approach leverages supervised contrastive learning to discover and identify forged instants by means of anomaly detection.<n>An efficient context-aware contrastive coding is introduced to further push the limit of instant feature distinguishability between genuine and forged instants.
arXiv Detail & Related papers (2025-06-10T06:40:43Z)
Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs) Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z)
Harnessing Contrastive Learning and Neural Transformation for Time Series Anomaly Detection [0.0]
Time series anomaly detection (TSAD) plays a vital role in many industrial applications.<n>Contrastive learning has gained momentum in the time series domain for its prowess in extracting meaningful representations from unlabeled data.<n>In this study, we propose a novel approach, CNT, that incorporates a window-based contrastive learning strategy fortified with learnable transformations.
arXiv Detail & Related papers (2023-04-16T21:36:19Z)
Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction [60.60223171143206]
Trajectory prediction is a crucial undertaking in understanding entity movement or human behavior from observed sequences. Current methods often assume that the observed sequences are complete while ignoring the potential for missing values. This paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously.
arXiv Detail & Related papers (2023-03-28T14:27:27Z)
TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals. Our approach locally modulates the saliency predictions by combining the learned temporal maps. Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z)
Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning [42.22064610886404]
We present a general framework of predictive learning, in which the encoder and decoder capture intra-frame features and the middle temporal module catches inter-frame dependencies. To parallelize the temporal module, we propose the Temporal Attention Unit (TAU), which decomposes the temporal attention into intraframe statical attention and inter-frame dynamical attention.
arXiv Detail & Related papers (2022-06-24T07:43:50Z)
An Attention-based ConvLSTM Autoencoder with Dynamic Thresholding for Unsupervised Anomaly Detection in Multivariate Time Series [2.9685635948299995]
We propose an unsupervised Attention-based Convolutional Long Short-Term Memory (ConvLSTM) Autoencoder with Dynamic Thresholding (ACLAE-DT) framework for anomaly detection and diagnosis. The framework starts by pre-processing and enriching the data, before constructing feature images to characterize the system statuses. The constructed feature images are fed into an attention-based ConvLSTM autoencoder, which aims to encode the constructed feature images and capture the temporal behavior. The reconstruction errors are then computed and subjected to a statistical-based, dynamic thresholding mechanism to detect and diagnose the anomalies
arXiv Detail & Related papers (2022-01-23T04:01:43Z)
Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes [51.65308857232767]
Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks. We develop a novel approach to perform context consistency checks using language models.
arXiv Detail & Related papers (2021-08-19T00:52:10Z)
Neural Contextual Anomaly Detection for Time Series [7.523820334642732]
We introduce Neural Contextual Anomaly Detection (NCAD), a framework for anomaly detection on time series. NCAD scales seamlessly from the unsupervised to supervised setting. We demonstrate empirically on standard benchmark datasets that our approach obtains a state-of-the-art performance.
arXiv Detail & Related papers (2021-07-16T04:33:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.