Related papers: Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline

Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline

URL: http://arxiv.org/abs/2404.00847v1
Date: Mon, 1 Apr 2024 01:25:06 GMT
Title: Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
Authors: Anas Al-lahham, Muhammad Zaigham Zaheer, Nurbek Tastan, Karthik Nandakumar,
Abstract summary: Unsupervised (US) video anomaly detection (VAD) in surveillance applications is gaining more popularity. In this paper, we propose a new baseline for anomaly detection capable of localizing anomalous events in complex surveillance videos in a fully unsupervised fashion. We modify existing VAD datasets to extensively evaluate our approach as well as existing US SOTA methods on two large-scale datasets.
Score: 7.917971102697765
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Unsupervised (US) video anomaly detection (VAD) in surveillance applications is gaining more popularity recently due to its practical real-world applications. As surveillance videos are privacy sensitive and the availability of large-scale video data may enable better US-VAD systems, collaborative learning can be highly rewarding in this setting. However, due to the extremely challenging nature of the US-VAD task, where learning is carried out without any annotations, privacy-preserving collaborative learning of US-VAD systems has not been studied yet. In this paper, we propose a new baseline for anomaly detection capable of localizing anomalous events in complex surveillance videos in a fully unsupervised fashion without any labels on a privacy-preserving participant-based distributed training configuration. Additionally, we propose three new evaluation protocols to benchmark anomaly detection approaches on various scenarios of collaborations and data availability. Based on these protocols, we modify existing VAD datasets to extensively evaluate our approach as well as existing US SOTA methods on two large-scale datasets including UCF-Crime and XD-Violence. All proposed evaluation protocols, dataset splits, and codes are available here: https://github.com/AnasEmad11/CLAP

Related papers

Adapting Vision-Language Models Without Labels: A Comprehensive Survey [74.17944178027015]
Vision-Language Models (VLMs) have demonstrated remarkable generalization capabilities across a wide range of tasks.<n>Recent research has increasingly focused on unsupervised adaptation methods that do not rely on labeled data.<n>We propose a taxonomy based on the availability and nature of unlabeled visual data, categorizing existing approaches into four key paradigms.
arXiv Detail & Related papers (2025-08-07T16:27:37Z)
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification [32.64320734048732]
In real-world surveillance contexts, data is distributed across multiple devices/entities, raising privacy and ownership concerns. We propose L2RW, a benchmark that brings VI-ReID closer to real-world applications.
arXiv Detail & Related papers (2025-03-15T18:56:29Z)
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM [1.7051307941715268]
Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. Existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments. This study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model.
arXiv Detail & Related papers (2025-03-06T14:52:34Z)
Privacy-Preserving Video Anomaly Detection: A Survey [10.899433437231139]
Video Anomaly Detection (VAD) aims to automatically analyze patterns in surveillance videos collected from open spaces to detect anomalous events that may cause harm without physical contact. The lack of transparency in video transmission and usage raises public concerns about privacy and ethics limiting the real-world application of VAD. Recently, researchers have focused on privacy concerns in VAD by conducting systematic studies from various perspectives including data, features, and systems. This article systematically reviews progress in P2VAD for the first time, defining its scope and providing an intuitive taxonomy.
arXiv Detail & Related papers (2024-11-21T20:29:59Z)
Deep Learning for Video Anomaly Detection: A Review [52.74513211976795]
Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos. In the era of deep learning, a great variety of deep learning based methods are constantly emerging for the VAD task. This review covers the spectrum of five different categories, namely, semi-supervised, weakly supervised, fully supervised, unsupervised and open-set supervised VAD.
arXiv Detail & Related papers (2024-09-09T07:31:16Z)
Open-Vocabulary Spatio-Temporal Action Detection [59.91046192096296]
Open-vocabulary-temporal action detection (OV-STAD) is an important fine-grained video understanding task. OV-STAD requires training a model on a limited set of base classes with box and label supervision. To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs.
arXiv Detail & Related papers (2024-05-17T14:52:47Z)
Harnessing Large Language Models for Training-free Video Anomaly Detection [34.76811491190446]
Video anomaly detection (VAD) aims to temporally locate abnormal events in a video. Training-based methods are prone to be domain-specific, thus being costly for practical deployment. We propose LAnguage-based VAD (LAVAD), a method tackling VAD in a novel, training-free paradigm.
arXiv Detail & Related papers (2024-04-01T09:34:55Z)
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges [10.809558232493236]
We propose a new research direction of surveillance video-and-language understanding, and construct the first multimodal surveillance video dataset. We manually annotate the real-world surveillance dataset UCF-Crime with fine-grained event content and timing. We benchmark SOTA models for four multimodal tasks on this newly created dataset, which serve as new baselines for surveillance video-and-language understanding.
arXiv Detail & Related papers (2023-09-25T07:46:56Z)
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples [128.25509832644025]
There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet. UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models. We present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations.
arXiv Detail & Related papers (2022-12-31T04:26:25Z)
Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model [0.0]
Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media. Previous work has largely relied on action recognition techniques to tackle this problem. We design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.
arXiv Detail & Related papers (2022-09-23T08:29:16Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
Generative Cooperative Learning for Unsupervised Video Anomaly Detection [29.07998538748002]
We propose a novel unsupervised Generative Cooperative Learning (GCL) approach for video anomaly detection. In essence, both networks get trained in a cooperative fashion, thereby allowing unsupervised learning. We conduct extensive experiments on two large-scale video anomaly detection datasets.
arXiv Detail & Related papers (2022-03-08T09:36:51Z)
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels. Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework. We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.