Spatio-Temporal Relation Learning for Video Anomaly Detection
- URL: http://arxiv.org/abs/2209.13116v1
- Date: Tue, 27 Sep 2022 02:19:31 GMT
- Title: Spatio-Temporal Relation Learning for Video Anomaly Detection
- Authors: Hui Lv, Zhen Cui, Biao Wang, Jian Yang
- Abstract summary: Anomaly identification is highly dependent on the relationship between the object and the scene.
In this paper, we propose a Spatial-Temporal Relation Learning framework to tackle the video anomaly detection task.
Experiments are conducted on three public datasets, and the superior performance over the state-of-the-art methods demonstrates the effectiveness of our method.
- Score: 35.59510027883497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly identification is highly dependent on the relationship between the
object and the scene, as different/same object actions in same/different scenes
may lead to various degrees of normality and anomaly. Therefore, object-scene
relation actually plays a crucial role in anomaly detection but is inadequately
explored in previous works. In this paper, we propose a Spatial-Temporal
Relation Learning (STRL) framework to tackle the video anomaly detection task.
First, considering dynamic characteristics of the objects as well as scene
areas, we construct a Spatio-Temporal Auto-Encoder (STAE) to jointly exploit
spatial and temporal evolution patterns for representation learning. For better
pattern extraction, two decoding branches are designed in the STAE module, i.e.
an appearance branch capturing spatial cues by directly predicting the next
frame, and a motion branch focusing on modeling the dynamics via optical flow
prediction. Then, to well concretize the object-scene relation, a Relation
Learning (RL) module is devised to analyze and summarize the normal relations
by introducing the Knowledge Graph Embedding methodology. Specifically in this
process, the plausibility of object-scene relation is measured by jointly
modeling object/scene features and optimizable object-scene relation maps.
Extensive experiments are conducted on three public datasets, and the superior
performance over the state-of-the-art methods demonstrates the effectiveness of
our method.
Related papers
- Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection.
Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly.
Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Object-centric and memory-guided normality reconstruction for video
anomaly detection [56.64792194894702]
This paper addresses anomaly detection problem for videosurveillance.
Due to the inherent rarity and heterogeneity of abnormal events, the problem is viewed as a normality modeling strategy.
Our model learns object-centric normal patterns without seeing anomalous samples during training.
arXiv Detail & Related papers (2022-03-07T19:28:39Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Multiple Object Tracking with Correlation Learning [16.959379957515974]
We propose to exploit the local correlation module to model the topological relationship between targets and their surrounding environment.
Specifically, we establish dense correspondences of each spatial location and its context, and explicitly constrain the correlation volumes through self-supervised learning.
Our approach demonstrates the effectiveness of correlation learning with the superior performance and obtains state-of-the-art MOTA of 76.5% and IDF1 of 73.6% on MOT17.
arXiv Detail & Related papers (2021-04-08T06:48:02Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.