Related papers: Human-Centric Anomaly Detection in Surveillance Videos Using YOLO-World and Spatio-Temporal Deep Learning

Human-Centric Anomaly Detection in Surveillance Videos Using YOLO-World and Spatio-Temporal Deep Learning

URL: http://arxiv.org/abs/2510.22056v1
Date: Fri, 24 Oct 2025 22:38:17 GMT
Title: Human-Centric Anomaly Detection in Surveillance Videos Using YOLO-World and Spatio-Temporal Deep Learning
Authors: Mohammad Ali Etemadi Naeen, Hoda Mohammadzade, Saeed Bagheri Shouraki,
Abstract summary: Anomaly detection in surveillance videos remains a challenging task due to the diversity of abnormal events, class imbalance, and scene-dependent visual clutter.<n>We propose a robust deep learning framework that integrates human-weighted preprocessing withtemporal modeling for multi-class anomaly classification.<n>Our method achieves a mean test accuracy of 92.41% across three independent trials, with per-class F1-scores consistently exceeding 0.85.
Score: 0.1019561860229868
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Anomaly detection in surveillance videos remains a challenging task due to the diversity of abnormal events, class imbalance, and scene-dependent visual clutter. To address these issues, we propose a robust deep learning framework that integrates human-centric preprocessing with spatio-temporal modeling for multi-class anomaly classification. Our pipeline begins by applying YOLO-World - an open-vocabulary vision-language detector - to identify human instances in raw video clips, followed by ByteTrack for consistent identity-aware tracking. Background regions outside detected bounding boxes are suppressed via Gaussian blurring, effectively reducing scene-specific distractions and focusing the model on behaviorally relevant foreground content. The refined frames are then processed by an ImageNet-pretrained InceptionV3 network for spatial feature extraction, and temporal dynamics are captured using a bidirectional LSTM (BiLSTM) for sequence-level classification. Evaluated on a five-class subset of the UCF-Crime dataset (Normal, Burglary, Fighting, Arson, Explosion), our method achieves a mean test accuracy of 92.41% across three independent trials, with per-class F1-scores consistently exceeding 0.85. Comprehensive evaluation metrics - including confusion matrices, ROC curves, and macro/weighted averages - demonstrate strong generalization and resilience to class imbalance. The results confirm that foreground-focused preprocessing significantly enhances anomaly discrimination in real-world surveillance scenarios.

Related papers

Anomalous Decision Discovery using Inverse Reinforcement Learning [3.3675535571071746]
Anomaly detection plays a critical role in Autonomous Vehicles (AVs) by identifying unusual behaviors through perception systems.<n>Current approaches, which often rely on predefined thresholds or supervised learning paradigms, exhibit reduced efficacy when confronted with unseen scenarios.<n>We present Trajectory-Reward Guided Adaptive Pre-training (TRAP), a novel IRL framework for anomaly detection.
arXiv Detail & Related papers (2025-07-06T17:01:02Z)
UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z)
Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection [11.250490586786878]
Video anomaly detection aims to develop automated models capable of identifying abnormal events in surveillance videos.<n>We show that distilling knowledge from aggregated representations of multiple backbones into a single-backbone Student model achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-06-05T00:44:42Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image. In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting. Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z)
Noisy-ArcMix: Additive Noisy Angular Margin Loss Combined With Mixup Anomalous Sound Detection [5.1308092683559225]
Unsupervised anomalous sound detection (ASD) aims to identify anomalous sounds by learning the features of normal operational sounds and sensing their deviations. Recent approaches have focused on the self-supervised task utilizing the classification of normal data, and advanced models have shown that securing representation space for anomalous data is important. We propose a training technique aimed at ensuring intra-class compactness and increasing the angle gap between normal and abnormal samples.
arXiv Detail & Related papers (2023-10-10T07:04:36Z)
Locality-aware Attention Network with Discriminative Dynamics Learning for Weakly Supervised Anomaly Detection [0.8883733362171035]
We propose a Discriminative Dynamics Learning (DDL) method with two objective functions, i.e., dynamics ranking loss and dynamics alignment loss. A Locality-aware Attention Network (LA-Net) is constructed to capture global correlations and re-calibrate the location preference across snippets, followed by a multilayer perceptron with causal convolution to obtain anomaly scores.
arXiv Detail & Related papers (2022-08-11T04:27:33Z)
Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training. Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z)
Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning. In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection. We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z)
CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection [20.368114998124295]
We propose a weakly supervised anomaly detection method which has manifold contributions. The proposed method obtains83.03% and 89.67% frame-level AUC performance on the UCF Crime and ShanghaiTech datasets respectively.
arXiv Detail & Related papers (2020-11-24T13:27:40Z)
Robust Unsupervised Video Anomaly Detection by Multi-Path Frame Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design. Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.