Video Event Restoration Based on Keyframes for Video Anomaly Detection
- URL: http://arxiv.org/abs/2304.05112v1
- Date: Tue, 11 Apr 2023 10:13:19 GMT
- Title: Video Event Restoration Based on Keyframes for Video Anomaly Detection
- Authors: Zhiwei Yang, Jing Liu, Zhaoyang Wu, Peng Wu, Xiaotao Liu
- Abstract summary: Existing deep neural network based anomaly detection (VAD) methods mostly follow the route of frame reconstruction or frame prediction.
We introduce a brand-new VAD paradigm to break through these limitations.
We propose a novel U-shaped Swin Transformer Network with Dual Skip Connections (USTN-DSC) for video event restoration.
- Score: 9.18057851239942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video anomaly detection (VAD) is a significant computer vision problem.
Existing deep neural network (DNN) based VAD methods mostly follow the route of
frame reconstruction or frame prediction. However, the lack of mining and
learning of higher-level visual features and temporal context relationships in
videos limits the further performance of these two approaches. Inspired by
video codec theory, we introduce a brand-new VAD paradigm to break through
these limitations: First, we propose a new task of video event restoration
based on keyframes. Encouraging DNN to infer missing multiple frames based on
video keyframes so as to restore a video event, which can more effectively
motivate DNN to mine and learn potential higher-level visual features and
comprehensive temporal context relationships in the video. To this end, we
propose a novel U-shaped Swin Transformer Network with Dual Skip Connections
(USTN-DSC) for video event restoration, where a cross-attention and a temporal
upsampling residual skip connection are introduced to further assist in
restoring complex static and dynamic motion object features in the video. In
addition, we propose a simple and effective adjacent frame difference loss to
constrain the motion consistency of the video sequence. Extensive experiments
on benchmarks demonstrate that USTN-DSC outperforms most existing methods,
validating the effectiveness of our method.
Related papers
- Learning Truncated Causal History Model for Video Restoration [14.381907888022615]
TURTLE learns the truncated causal history model for efficient and high-performing video restoration.
We report new state-of-the-art results on a multitude of video restoration benchmark tasks.
arXiv Detail & Related papers (2024-10-04T21:31:02Z) - Enhancing Bandwidth Efficiency for Video Motion Transfer Applications using Deep Learning Based Keypoint Prediction [4.60378493357739]
We propose a deep learning based novel prediction framework for enhanced bandwidth reduction in motion transfer enabled video applications.
For real-time applications, our results show the effectiveness of our proposed architecture by enabling up to 2x additional bandwidth reduction.
arXiv Detail & Related papers (2024-03-17T20:36:43Z) - Event-aware Video Corpus Moment Retrieval [79.48249428428802]
Video Corpus Moment Retrieval (VCMR) is a practical video retrieval task focused on identifying a specific moment within a vast corpus of untrimmed videos.
Existing methods for VCMR typically rely on frame-aware video retrieval, calculating similarities between the query and video frames to rank videos.
We propose EventFormer, a model that explicitly utilizes events within videos as fundamental units for video retrieval.
arXiv Detail & Related papers (2024-02-21T06:55:20Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Frame-rate Up-conversion Detection Based on Convolutional Neural Network
for Learning Spatiotemporal Features [7.895528973776606]
This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion.
FCDNet uses a stack of consecutive frames as the input and effectively learns artifacts using network blocks to learn features.
arXiv Detail & Related papers (2021-03-25T08:47:46Z) - Multiple Instance-Based Video Anomaly Detection using Deep Temporal
Encoding-Decoding [5.255783459833821]
We propose a weakly supervised deep temporal encoding-decoding solution for anomaly detection in surveillance videos.
The proposed approach uses both abnormal and normal video clips during the training phase.
The results show that the proposed method performs similar to or better than the state-of-the-art solutions for anomaly detection in video surveillance applications.
arXiv Detail & Related papers (2020-07-03T08:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.