CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention
- URL: http://arxiv.org/abs/2404.18952v1
- Date: Sat, 27 Apr 2024 20:09:40 GMT
- Title: CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention
- Authors: Damith Chamalke Senadeera, Xiaoyun Yang, Dimitrios Kollias, Gregory Slabaugh,
- Abstract summary: We introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance.
By focusing on both local and globaltemporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets.
- Score: 21.354382437543315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Cropping with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatiotemporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods.
Related papers
- iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency [0.0]
We introduce iiANET (Inception Inspired Attention Network), an efficient hybrid model designed to capture long-range dependencies in complex images.
The fundamental building block, iiABlock, integrates global 2D-MHSA (Multi-Head Self-Attention) with Registers, MBConv2 (MobileNetV2-based convolution), and dilated convolution in parallel.
We serially integrate an ECANET (Efficient Channel Attention Network) at the end of each iiABlock to calibrate channel-wise attention for enhanced model performance.
arXiv Detail & Related papers (2024-07-10T12:39:02Z) - Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition [68.6707284662443]
Low-Light Video Enhancement (LLVE) seeks to restore dynamic and static scenes plagued by severe invisibility and noise.
One critical aspect is formulating a consistency constraint specifically for temporal-spatial illumination and appearance enhanced versions.
We present an innovative video Retinex-based decomposition strategy that operates without the need for explicit supervision.
arXiv Detail & Related papers (2024-05-24T15:56:40Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for
Video-Empowered Intelligent Transportation [79.18450119567315]
Adverse weather conditions pose severe challenges for video-based transportation surveillance.
We propose a dual attention and dual frequency-guided dehazing network (termed DADFNet) for real-time visibility enhancement.
arXiv Detail & Related papers (2023-04-19T11:55:30Z) - Video Violence Recognition and Localization using a Semi-Supervised
Hard-Attention Model [0.0]
Violence monitoring and surveillance systems could keep communities safe and save lives.
The current state-of-the-art deep learning approaches to video violence recognition to higher levels of accuracy and performance could enable surveillance systems to be more reliable and scalable.
The main contribution of the proposed deep reinforcement learning method is to achieve state-of-the-art accuracy on RWF, Hockey, and Movies datasets.
arXiv Detail & Related papers (2022-02-04T16:15:26Z) - Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV.
We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z) - Real Time Action Recognition from Video Footage [0.5219568203653523]
Video surveillance cameras have added a new dimension to detect crime.
This research focuses on integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities.
arXiv Detail & Related papers (2021-12-13T07:27:41Z) - Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech
Attacks [67.7648985513978]
Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks.
We present a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism.
arXiv Detail & Related papers (2021-07-19T12:27:40Z) - DR-TANet: Dynamic Receptive Temporal Attention Network for Street Scene
Change Detection [35.29786193920396]
This paper proposes the temporal attention and explores the impact of the dependency-scope size of temporal attention on the performance of change detection.
On street scene datasets GSV', TSUNAMI' and VL-CMU-CD', our approach gains excellent performance, establishing new state-of-the-art scores without bells and whistles.
arXiv Detail & Related papers (2021-03-01T10:01:35Z) - An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement [132.60976158877608]
We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples.
In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information.
The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
arXiv Detail & Related papers (2020-12-24T00:03:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.