Multi-scale Spatial-temporal Interaction Network for Video Anomaly
Detection
- URL: http://arxiv.org/abs/2306.10239v2
- Date: Thu, 6 Jul 2023 04:38:44 GMT
- Title: Multi-scale Spatial-temporal Interaction Network for Video Anomaly
Detection
- Authors: Zhiyuan Ning, Zhangxun Li, Zhengliang Guo, Zile Wang, Liang Song
- Abstract summary: Video Anomaly Detection (VAD) is an essential yet challenging task in signal processing.
We propose a Multi-scale Spatial-Temporal Interaction Network (MSTI-Net) for VAD.
- Score: 3.113134714967787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Anomaly Detection (VAD) is an essential yet challenging task in signal
processing. Since certain anomalies cannot be detected by isolated analysis of
either temporal or spatial information, the interaction between these two types
of data is considered crucial for VAD. However, current dual-stream
architectures either confine this integral interaction to the bottleneck of the
autoencoder or introduce anomaly-irrelevant background pixels into the
interactive process, hindering the accuracy of VAD. To address these
deficiencies, we propose a Multi-scale Spatial-Temporal Interaction Network
(MSTI-Net) for VAD. First, to prioritize the detection of moving objects in the
scene and harmonize the substantial semantic discrepancies between the two
types of data, we propose an Attention-based Spatial-Temporal Fusion Module
(ASTFM) as a substitute for the conventional direct fusion. Furthermore, we
inject multi-ASTFM-based connections that bridge the appearance and motion
streams of the dual-stream network, thus fostering multi-scale spatial-temporal
interaction. Finally, to bolster the delineation between normal and abnormal
activities, our system records the regular information in a memory module.
Experimental results on three benchmark datasets validate the effectiveness of
our approach, which achieves AUCs of 96.8%, 87.6%, and 73.9% on the UCSD Ped2,
CUHK Avenue, and ShanghaiTech datasets, respectively.
Related papers
- Multimodal Attention-Enhanced Feature Fusion-based Weekly Supervised Anomaly Violence Detection [1.9223495770071632]
This system uses three feature streams: RGB video, optical flow, and audio signals, where each stream extracts complementary spatial and temporal features.
The system significantly improves anomaly detection accuracy and robustness across three datasets.
arXiv Detail & Related papers (2024-09-17T14:17:52Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Learning Appearance-motion Normality for Video Anomaly Detection [11.658792932975652]
We propose spatial-temporal memories augmented two-stream auto-encoder framework.
It learns the appearance normality and motion normality independently and explores the correlations via adversarial learning.
Our framework outperforms the state-of-the-art methods, achieving AUCs of 98.1% and 89.8% on UCSD Ped2 and CUHK Avenue datasets.
arXiv Detail & Related papers (2022-07-27T08:30:19Z) - Unsupervised Deep Anomaly Detection for Multi-Sensor Time-Series Signals [10.866594993485226]
We propose a novel deep learning-based anomaly detection algorithm called Deep Convolutional Autoencoding Memory network (CAE-M)
We first build a Deep Convolutional Autoencoder to characterize spatial dependence of multi-sensor data with a Maximum Mean Discrepancy (MMD)
Then, we construct a Memory Network consisting of linear (Autoregressive Model) and non-linear predictions (Bigressive LSTM with Attention) to capture temporal dependence from time-series data.
arXiv Detail & Related papers (2021-07-27T06:48:20Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.