CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised
Video Anomaly Detection
- URL: http://arxiv.org/abs/2212.05136v3
- Date: Mon, 3 Jul 2023 23:03:22 GMT
- Title: CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised
Video Anomaly Detection
- Authors: Hyekang Kevin Joo, Khoa Vo, Kashu Yamazaki, Ngan Le
- Abstract summary: Video anomaly detection (VAD) is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video.
We first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique.
Our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on three commonly-used benchmark datasets in the VAD problem.
- Score: 3.146076597280736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anomaly detection (VAD) -- commonly formulated as a multiple-instance
learning problem in a weakly-supervised manner due to its labor-intensive
nature -- is a challenging problem in video surveillance where the frames of
anomaly need to be localized in an untrimmed video. In this paper, we first
propose to utilize the ViT-encoded visual features from CLIP, in contrast with
the conventional C3D or I3D features in the domain, to efficiently extract
discriminative representations in the novel technique. We then model temporal
dependencies and nominate the snippets of interest by leveraging our proposed
Temporal Self-Attention (TSA). The ablation study confirms the effectiveness of
TSA and ViT feature. The extensive experiments show that our proposed CLIP-TSA
outperforms the existing state-of-the-art (SOTA) methods by a large margin on
three commonly-used benchmark datasets in the VAD problem (UCF-Crime,
ShanghaiTech Campus, and XD-Violence). Our source code is available at
https://github.com/joos2010kj/CLIP-TSA.
Related papers
- Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - Delving into CLIP latent space for Video Anomaly Recognition [24.37974279994544]
We introduce the novel method AnomalyCLIP, the first to combine Large Language and Vision (LLV) models, such as CLIP.
Our approach specifically involves manipulating the latent CLIP feature space to identify the normal event subspace.
When anomalous frames are projected onto these directions, they exhibit a large feature magnitude if they belong to a particular class.
arXiv Detail & Related papers (2023-10-04T14:01:55Z) - TeD-SPAD: Temporal Distinctiveness for Self-supervised
Privacy-preservation for video Anomaly Detection [59.04634695294402]
Video anomaly detection (VAD) without human monitoring is a complex computer vision task.
Privacy leakage in VAD allows models to pick up and amplify unnecessary biases related to people's personal information.
We propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner.
arXiv Detail & Related papers (2023-08-21T22:42:55Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z) - Spatio-Temporal Self-Attention Network for Video Saliency Prediction [13.873682190242365]
3D convolutional neural networks have achieved promising results for video tasks in computer vision.
We propose a novel Spatio-Temporal Self-Temporal Self-Attention 3 Network (STSANet) for video saliency prediction.
arXiv Detail & Related papers (2021-08-24T12:52:47Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z) - 3D ResNet with Ranking Loss Function for Abnormal Activity Detection in
Videos [6.692686655277163]
This study is motivated by the recent state-of-art work of abnormal activity detection.
In the absence of temporal-annotations, such a model is prone to give a false alarm while detecting the abnormalities.
In this paper, we focus on the task of minimizing the false alarm rate while performing an abnormal activity detection task.
arXiv Detail & Related papers (2020-02-04T05:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.