Related papers: AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

URL: http://arxiv.org/abs/2503.04504v1
Date: Thu, 06 Mar 2025 14:52:34 GMT
Title: AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Authors: Sunghyun Ahn, Youngwan Jo, Kijung Lee, Sein Kwon, Inpyo Hong, Sanghyun Park,
Abstract summary: Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision.<n>Existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments.<n>This study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model.
Score: 1.7051307941715268
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. However, existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments. Consequently, users should retrain models or develop separate AI models for new environments, which requires expertise in machine learning, high-performance hardware, and extensive data collection, limiting the practical usability of VAD. To address these challenges, this study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model. C-VAD considers user-defined text as an abnormal event and detects frames containing a specified event in a video. We effectively implemented AnyAnomaly using a context-aware visual question answering without fine-tuning the large vision language model. To validate the effectiveness of the proposed model, we constructed C-VAD datasets and demonstrated the superiority of AnyAnomaly. Furthermore, our approach showed competitive performance on VAD benchmark datasets, achieving state-of-the-art results on the UBnormal dataset and outperforming other methods in generalization across all datasets. Our code is available online at github.com/SkiddieAhn/Paper-AnyAnomaly.

Related papers

GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection [6.09434007746295]
Video anomaly detection (VAD) plays a critical role in public safety applications such as intelligent surveillance.<n>We propose a generative video-enhanced weakly-supervised VAD framework to produce semantically controllable and physically plausible synthetic videos.<n>The proposed framework outperforms state-of-the-art methods on UCF-Crime datasets.
arXiv Detail & Related papers (2025-08-01T04:42:40Z)
From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection [60.11169426478452]
This paper aims to introduce fixation information to assist the detection of salient objects under weak supervision.<n>We propose a Position and Semantic Embedding (PSE) module to provide location and semantic guidance during the feature learning process.<n>An Intra-Inter Mixed Contrastive (MCII) model improves thetemporal modeling capabilities under weak supervision.
arXiv Detail & Related papers (2025-06-30T05:01:40Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
Deep Learning for Video Anomaly Detection: A Review [52.74513211976795]
Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos. In the era of deep learning, a great variety of deep learning based methods are constantly emerging for the VAD task. This review covers the spectrum of five different categories, namely, semi-supervised, weakly supervised, fully supervised, unsupervised and open-set supervised VAD.
arXiv Detail & Related papers (2024-09-09T07:31:16Z)
Towards Adaptive Human-centric Video Anomaly Detection: A Comprehensive Framework and A New Benchmark [2.473948454680334]
Human-centric Video Anomaly Detection (VAD) aims to identify human behaviors that deviate from normal. We introduce the HuVAD (Human-centric privacy-enhanced Video Anomaly Detection) dataset and a novel Unsupervised Continual Anomaly Learning framework.
arXiv Detail & Related papers (2024-08-26T14:55:23Z)
MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation [5.0923114224599555]
This paper introduces a novel hierarchical graph neural network (GNN) based model MissionGNN. Our approach circumvents the limitations of previous methods by avoiding heavy gradient computations on large multimodal models. Our model provides a practical and efficient solution for real-time video analysis without the constraints of previous segmentation-based or multimodal approaches.
arXiv Detail & Related papers (2024-06-27T01:09:07Z)
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos. Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models. We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z)
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models. BVS supports a large number of adjustable parameters at the scene level. We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z)
Harnessing Large Language Models for Training-free Video Anomaly Detection [34.76811491190446]
Video anomaly detection (VAD) aims to temporally locate abnormal events in a video. Training-based methods are prone to be domain-specific, thus being costly for practical deployment. We propose LAnguage-based VAD (LAVAD), a method tackling VAD in a novel, training-free paradigm.
arXiv Detail & Related papers (2024-04-01T09:34:55Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal. Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos. This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z)
A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection [14.089888316857426]
This paper focuses on weakly supervised video anomaly detection. We develop a lightweight video anomaly detection model. We show that our model can achieve comparable or even superior AUC score compared to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T01:23:08Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)
CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.