Two-stream Multi-dimensional Convolutional Network for Real-time
Violence Detection
- URL: http://arxiv.org/abs/2211.04255v1
- Date: Tue, 8 Nov 2022 14:04:47 GMT
- Title: Two-stream Multi-dimensional Convolutional Network for Real-time
Violence Detection
- Authors: Dipon Kumar Ghosh and Amitabha Chakrabarty
- Abstract summary: This work presents a novel architecture for violence detection called Two-stream Multi-dimensional Convolutional Network (2s-MDCN)
Our proposed method extracts temporal and spatial information independently by 1D, 2D, and 3D convolutions.
Our models obtained state-of-the-art accuracy of 89.7% on the largest violence detection benchmark dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing number of surveillance cameras and security concerns have made
automatic violent activity detection from surveillance footage an active area
for research. Modern deep learning methods have achieved good accuracy in
violence detection and proved to be successful because of their applicability
in intelligent surveillance systems. However, the models are computationally
expensive and large in size because of their inefficient methods for feature
extraction. This work presents a novel architecture for violence detection
called Two-stream Multi-dimensional Convolutional Network (2s-MDCN), which uses
RGB frames and optical flow to detect violence. Our proposed method extracts
temporal and spatial information independently by 1D, 2D, and 3D convolutions.
Despite combining multi-dimensional convolutional networks, our models are
lightweight and efficient due to reduced channel capacity, yet they learn to
extract meaningful spatial and temporal information. Additionally, combining
RGB frames and optical flow yields 2.2% more accuracy than a single RGB stream.
Regardless of having less complexity, our models obtained state-of-the-art
accuracy of 89.7% on the largest violence detection benchmark dataset.
Related papers
- 2D bidirectional gated recurrent unit convolutional Neural networks for end-to-end violence detection In videos [0.0]
We propose an architecture that combines a Bidirectional Gated Recurrent Unit (BiGRU) and a 2D Convolutional Neural Network (CNN) to detect violence in video sequences.
A CNN is used to extract spatial characteristics from each frame, while the BiGRU extracts temporal and local motion characteristics using CNN extracted features from multiple frames.
arXiv Detail & Related papers (2024-09-11T19:36:12Z) - Violence detection in videos using deep recurrent and convolutional neural networks [0.0]
We propose a deep learning architecture for violence detection which combines both recurrent neural networks (RNNs) and 2-dimensional convolutional neural networks (2D CNN)
In addition to video frames, we use optical flow computed using the captured sequences.
The proposed approaches reach the same level as the state-of-the-art techniques and sometime surpass them.
arXiv Detail & Related papers (2024-09-11T19:21:51Z) - 2D-Malafide: Adversarial Attacks Against Face Deepfake Detection Systems [8.717726409183175]
We introduce 2D-Malafide, a novel and lightweight adversarial attack designed to deceive face deepfake detection systems.
Unlike traditional additive noise approaches, 2D-Malafide optimises a small number of filter coefficients to generate robust adversarial perturbations.
Experiments, conducted using the FaceForensics++ dataset, demonstrate that 2D-Malafide substantially degrades detection performance in both white-box and black-box settings.
arXiv Detail & Related papers (2024-08-26T09:41:40Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - Real Time Action Recognition from Video Footage [0.5219568203653523]
Video surveillance cameras have added a new dimension to detect crime.
This research focuses on integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities.
arXiv Detail & Related papers (2021-12-13T07:27:41Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Efficient Two-Stream Network for Violence Detection Using Separable
Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet.
SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution.
Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Temporal Distinct Representation Learning for Action Recognition [139.93983070642412]
Two-Dimensional Convolutional Neural Network (2D CNN) is used to characterize videos.
Different frames of a video share the same 2D CNN kernels, which may result in repeated and redundant information utilization.
We propose a sequential channel filtering mechanism to excite the discriminative channels of features from different frames step by step, and thus avoid repeated information extraction.
Our method is evaluated on benchmark temporal reasoning datasets Something-Something V1 and V2, and it achieves visible improvements over the best competitor by 2.4% and 1.3%, respectively.
arXiv Detail & Related papers (2020-07-15T11:30:40Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.