Related papers: Learning Event Completeness for Weakly Supervised Video Anomaly Detection

Learning Event Completeness for Weakly Supervised Video Anomaly Detection

URL: http://arxiv.org/abs/2506.13095v1
Date: Mon, 16 Jun 2025 04:56:58 GMT
Title: Learning Event Completeness for Weakly Supervised Video Anomaly Detection
Authors: Yu Wang, Shiwei Chen,
Abstract summary: We present a novel Learning Event Completeness for Weakly Supervised Video Anomaly Detection (LEC-VAD)<n>LEC-VAD encodes both category-aware and category-agnostic semantics between vision and language.<n>We develop a novel memory bank-based prototype learning mechanism to enrich concise text descriptions associated with anomaly-event categories.
Score: 5.140169437190526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weakly supervised video anomaly detection (WS-VAD) is tasked with pinpointing temporal intervals containing anomalous events within untrimmed videos, utilizing only video-level annotations. However, a significant challenge arises due to the absence of dense frame-level annotations, often leading to incomplete localization in existing WS-VAD methods. To address this issue, we present a novel LEC-VAD, Learning Event Completeness for Weakly Supervised Video Anomaly Detection, which features a dual structure designed to encode both category-aware and category-agnostic semantics between vision and language. Within LEC-VAD, we devise semantic regularities that leverage an anomaly-aware Gaussian mixture to learn precise event boundaries, thereby yielding more complete event instances. Besides, we develop a novel memory bank-based prototype learning mechanism to enrich concise text descriptions associated with anomaly-event categories. This innovation bolsters the text's expressiveness, which is crucial for advancing WS-VAD. Our LEC-VAD demonstrates remarkable advancements over the current state-of-the-art methods on two benchmark datasets XD-Violence and UCF-Crime.

Related papers

Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs) Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z)
LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction [8.163356555241322]
We propose a novel framework, called LaSe-E2V, that can achieve semantic-aware high-quality E2V reconstruction. We first propose an Event-guided Spatiotemporal Attention (ESA) module to condition the event data to the denoising pipeline effectively. We then introduce an event-aware mask loss to ensure temporal coherence and a noise strategy to enhance spatial consistency.
arXiv Detail & Related papers (2024-07-08T01:40:32Z)
MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation [5.0923114224599555]
Video Anomaly Detection and Video Anomaly Recognition are critically important for applications in intelligent surveillance, evidence investigation, violence alerting, etc.<n>These tasks face significant challenges due to the rarity of anomalies which leads to extremely imbalanced data and the impracticality of extensive frame-level data annotation for supervised learning.<n>This paper introduces a novel hierarchical graph neural network (GNN) based model MissionGNN that addresses these challenges by leveraging a state-of-the-art large language model and a comprehensive knowledge graph for efficient weakly supervised learning in VAR.
arXiv Detail & Related papers (2024-06-27T01:09:07Z)
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation [72.90144343056227]
We explore the visual representations produced from a pre-trained text-to-video (T2V) diffusion model for video understanding tasks. We introduce a novel framework, termed "VD-IT", tailored with dedicatedly designed components built upon a fixed T2V model. Our VD-IT achieves highly competitive results, surpassing many existing state-of-the-art methods.
arXiv Detail & Related papers (2024-03-18T17:59:58Z)
Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal. Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos. This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z)
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications. Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities. We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z)
Video Abnormal Event Detection by Learning to Complete Visual Cloze Tests [50.1446994599891]
Video abnormal event (VAD) is a vital semi-supervised task that requires learning with only roughly labeled normal videos. We propose a novel approach named visual cloze (VCC) which performs VAD by learning to complete "visual cloze tests" (VCTs) We show that VCC achieves state-of-the-art VAD performance.
arXiv Detail & Related papers (2021-08-05T04:05:36Z)
Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events [41.500063839748094]
anomaly detection (VAD) has made fruitful progress via deep neural network (DNN) Inspired by frequently-used cloze test in language study, we propose a brand-new VAD solution named Video Event Completion (VEC) VEC consistently outperform state-of-the-art methods by a notable margin (typically 1.5%-5% AUD) on commonly-used VAD benchmarks.
arXiv Detail & Related papers (2020-08-27T08:32:51Z)
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels. Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework. We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.