Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw
Puzzles
- URL: http://arxiv.org/abs/2207.10172v2
- Date: Fri, 22 Jul 2022 03:28:41 GMT
- Title: Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw
Puzzles
- Authors: Guodong Wang, Yunhong Wang, Jie Qin, Dongming Zhang, Xiuguo Bao, Di
Huang
- Abstract summary: Video Anomaly Detection (VAD) is an important topic in computer vision.
Motivated by the recent advances in self-supervised learning, this paper addresses VAD by solving an intuitive yet challenging pretext task.
Our method outperforms state-of-the-art counterparts on three public benchmarks.
- Score: 67.39567701983357
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Video Anomaly Detection (VAD) is an important topic in computer vision.
Motivated by the recent advances in self-supervised learning, this paper
addresses VAD by solving an intuitive yet challenging pretext task, i.e.,
spatio-temporal jigsaw puzzles, which is cast as a multi-label fine-grained
classification problem. Our method exhibits several advantages over existing
works: 1) the spatio-temporal jigsaw puzzles are decoupled in terms of spatial
and temporal dimensions, responsible for capturing highly discriminative
appearance and motion features, respectively; 2) full permutations are used to
provide abundant jigsaw puzzles covering various difficulty levels, allowing
the network to distinguish subtle spatio-temporal differences between normal
and abnormal events; and 3) the pretext task is tackled in an end-to-end manner
without relying on any pre-trained models. Our method outperforms
state-of-the-art counterparts on three public benchmarks. Especially on
ShanghaiTech Campus, the result is superior to reconstruction and
prediction-based methods by a large margin.
Related papers
- DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly [21.497180110855975]
We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks.
Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph.
We highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving.
arXiv Detail & Related papers (2024-02-29T16:09:12Z) - Long-Short Temporal Co-Teaching for Weakly Supervised Video Anomaly
Detection [14.721615285883423]
Weakly supervised anomaly detection (WS-VAD) is a challenging problem that aims to learn VAD models only with video-level annotations.
Our proposed method is able to better deal with anomalies with varying durations as well as subtle anomalies.
arXiv Detail & Related papers (2023-03-31T13:28:06Z) - Spatio-temporal predictive tasks for abnormal event detection in videos [60.02503434201552]
We propose new constrained pretext tasks to learn object level normality patterns.
Our approach consists in learning a mapping between down-scaled visual queries and their corresponding normal appearance and motion characteristics.
Experiments on several benchmark datasets demonstrate the effectiveness of our approach to localize and track anomalies.
arXiv Detail & Related papers (2022-10-27T19:45:12Z) - BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection
with Dynamic Temporal Stereo [15.479670314689418]
We introduce an effective temporal stereo method to dynamically select the scale of matching candidates.
We design an iterative algorithm to update more valuable candidates, making it adaptive to moving candidates.
BEVStereo achieves the new state-of-the-art performance on the camera-only track of nuScenes dataset.
arXiv Detail & Related papers (2022-09-21T10:21:25Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z) - Egocentric Action Recognition by Video Attention and Temporal Context [83.57475598382146]
We present the submission of Samsung AI Centre Cambridge to the CVPR 2020 EPIC-Kitchens Action Recognition Challenge.
In this challenge, action recognition is posed as the problem of simultaneously predicting a single verb' and noun' class label given an input trimmed video clip.
Our solution achieves strong performance on the challenge metrics without using object-specific reasoning nor extra training data.
arXiv Detail & Related papers (2020-07-03T18:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.