Convolutional Transformer based Dual Discriminator Generative
Adversarial Networks for Video Anomaly Detection
- URL: http://arxiv.org/abs/2107.13720v1
- Date: Thu, 29 Jul 2021 03:07:25 GMT
- Title: Convolutional Transformer based Dual Discriminator Generative
Adversarial Networks for Video Anomaly Detection
- Authors: Xinyang Feng, Dongjin Song, Yuncong Chen, Zhengzhang Chen, Jingchao
Ni, Haifeng Chen
- Abstract summary: We propose Conversaal Transformer based Dual Discriminator Generative Adrial Networks (CT-D2GAN) to perform unsupervised video anomaly detection.
It contains three key components, i., a convolutional encoder to capture the spatial information of input clips, a temporal self-attention module to encode the temporal dynamics and predict the future frame.
- Score: 27.433162897608543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting abnormal activities in real-world surveillance videos is an
important yet challenging task as the prior knowledge about video anomalies is
usually limited or unavailable. Despite that many approaches have been
developed to resolve this problem, few of them can capture the normal
spatio-temporal patterns effectively and efficiently. Moreover, existing works
seldom explicitly consider the local consistency at frame level and global
coherence of temporal dynamics in video sequences. To this end, we propose
Convolutional Transformer based Dual Discriminator Generative Adversarial
Networks (CT-D2GAN) to perform unsupervised video anomaly detection.
Specifically, we first present a convolutional transformer to perform future
frame prediction. It contains three key components, i.e., a convolutional
encoder to capture the spatial information of the input video clips, a temporal
self-attention module to encode the temporal dynamics, and a convolutional
decoder to integrate spatio-temporal features and predict the future frame.
Next, a dual discriminator based adversarial training procedure, which jointly
considers an image discriminator that can maintain the local consistency at
frame-level and a video discriminator that can enforce the global coherence of
temporal dynamics, is employed to enhance the future frame prediction. Finally,
the prediction error is used to identify abnormal video frames. Thoroughly
empirical studies on three public video anomaly detection datasets, i.e., UCSD
Ped2, CUHK Avenue, and Shanghai Tech Campus, demonstrate the effectiveness of
the proposed adversarial spatio-temporal modeling framework.
Related papers
- Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - Video Anomaly Detection using GAN [0.0]
This thesis study aims to offer the solution for this use case so that human resources won't be required to keep an eye out for any unusual activity in the surveillance system records.
We have developed a novel generative adversarial network (GAN) based anomaly detection model.
arXiv Detail & Related papers (2023-11-23T16:41:30Z) - Delving into CLIP latent space for Video Anomaly Recognition [24.37974279994544]
We introduce the novel method AnomalyCLIP, the first to combine Large Language and Vision (LLV) models, such as CLIP.
Our approach specifically involves manipulating the latent CLIP feature space to identify the normal event subspace.
When anomalous frames are projected onto these directions, they exhibit a large feature magnitude if they belong to a particular class.
arXiv Detail & Related papers (2023-10-04T14:01:55Z) - A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos [107.96514633713034]
We propose a spatial-temporal deformable attention based framework, named STNet.
Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion.
Experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance.
arXiv Detail & Related papers (2023-09-09T07:00:10Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Video Anomaly Detection via Prediction Network with Enhanced
Spatio-Temporal Memory Exchange [21.334952965297667]
Video anomaly detection is a challenging task because most anomalies are scarce and non-deterministic.
We design a Convolutional LSTM Auto-Encoder prediction framework with enhanced large-temporal memory exchange.
Evaluations on three popular benchmarks show that our framework outperforms existing prediction-based anomaly detection methods.
arXiv Detail & Related papers (2022-06-26T16:10:56Z) - Multi-Contextual Predictions with Vision Transformer for Video Anomaly
Detection [22.098399083491937]
understanding of thetemporal context of a video plays a vital role in anomaly detection.
We design a transformer model with three different contextual prediction streams: masked, whole and partial.
By learning to predict the missing frames of consecutive normal frames, our model can effectively learn various normality patterns in the video.
arXiv Detail & Related papers (2022-06-17T05:54:31Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.