Related papers: Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms

Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms

URL: http://arxiv.org/abs/2302.11027v1
Date: Tue, 21 Feb 2023 22:02:39 GMT
Title: Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms
Authors: Labib Ahmed Siddique, Rabita Junhai, Tanzim Reza, Salman Sayeed Khan, and Tanvir Rahman
Abstract summary: Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety. Deep learning video classification techniques can help us automate surveillance systems to detect violence as it happens.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens. In this research, we explore deep learning video classification techniques to detect violence as they are happening. Traditional image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. Therefore, many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models with methods such as skeleton points obtained through pose estimation and optical flow obtained through depth sensors, are not always practical in an IoT environment. Although these techniques ensure a higher accuracy score, they are computationally heavier. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNNTransformer and C3D. We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm ,76.76% on CNN-Transformer and 80% on C3D.

Related papers

Intelligent Image Sensing for Crime Analysis: A ML Approach towards Enhanced Violence Detection and Investigation [1.8219466405383231]
This paper introduces a comprehensive framework for violence detection and classification, employing Supervised Learning for both binary and multi-class violence classification.<n>Training is conducted on a diverse customized datasets with frame-level annotations, incorporating videos from surveillance cameras, human recordings, hockey fight, sohas and wvd dataset across various platforms.
arXiv Detail & Related papers (2025-06-16T18:39:16Z)
Deepfake Detection with Spatio-Temporal Consistency and Attention [46.1135899490656]
Deepfake videos are causing growing concerns among communities due to their ever-increasing realism. Current methods for detecting forged videos rely mainly on global frame features. We propose a neural Deepfake detector that focuses on the localized manipulative signatures of the forged videos.
arXiv Detail & Related papers (2025-02-12T08:51:33Z)
Real-Time Anomaly Detection in Video Streams [0.0]
This thesis is part of a CIFRE agreement between the company Othello and the LIASD laboratory. The objective is to develop an artificial intelligence system that can detect real-time dangers in a video stream.
arXiv Detail & Related papers (2024-11-29T14:24:33Z)
CCTV-Gun: Benchmarking Handgun Detection in CCTV Images [59.24281591714385]
Gun violence is a critical security problem, and it is imperative for the computer vision community to develop effective gun detection algorithms. detecting guns in real-world CCTV images remains a challenging and under-explored task. We present a benchmark, called textbfCCTV-Gun, which addresses the challenges of detecting handguns in real-world CCTV images.
arXiv Detail & Related papers (2023-03-19T16:17:35Z)
Detecting train driveshaft damages using accelerometer signals and Differential Convolutional Neural Networks [67.60224656603823]
This paper proposes the development of a railway axle condition monitoring system based on advanced 2D-Convolutional Neural Network (CNN) architectures. The resultant system converts the railway axle vibration signals into time-frequency domain representations, i.e., spectrograms, and, thus, trains a two-dimensional CNN to classify them depending on their cracks.
arXiv Detail & Related papers (2022-11-15T15:04:06Z)
Intelligent 3D Network Protocol for Multimedia Data Classification using Deep Learning [0.0]
We implement Hybrid Deep Learning Architecture that combines STIP and 3D CNN features to enhance the performance of 3D videos effectively. The results are compared with state-of-the-art frameworks from literature for action recognition on UCF101 with an accuracy of 95%.
arXiv Detail & Related papers (2022-07-23T12:24:52Z)
Real Time Action Recognition from Video Footage [0.5219568203653523]
Video surveillance cameras have added a new dimension to detect crime. This research focuses on integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities.
arXiv Detail & Related papers (2021-12-13T07:27:41Z)
Video Salient Object Detection via Contrastive Features and Attention Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection. A co-attention formulation is utilized to combine the low-level and high-level features. We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z)
Event and Activity Recognition in Video Surveillance for Cyber-Physical Systems [0.0]
Long-term motion patterns alone play a pivotal role in the task of recognizing an event. We show that the long-term motion patterns alone play a pivotal role in the task of recognizing an event. Only the temporal features are exploited using a hybrid Convolutional Neural Network (CNN) + Recurrent Neural Network (RNN) architecture.
arXiv Detail & Related papers (2021-11-03T08:30:38Z)
Spatiotemporal Inconsistency Learning for DeepFake Video Detection [51.747219106855624]
We present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions. And the ISM simultaneously utilizes the spatial information from SIM and temporal information from TIM to establish a more comprehensive spatial-temporal representation.
arXiv Detail & Related papers (2021-09-04T13:05:37Z)
Adversarially robust deepfake media detection using fused convolutional neural network predictions [79.00202519223662]
Current deepfake detection systems struggle against unseen data. We employ three different deep Convolutional Neural Network (CNN) models to classify fake and real images extracted from videos. The proposed technique outperforms state-of-the-art models with 96.5% accuracy.
arXiv Detail & Related papers (2021-02-11T11:28:00Z)
Training Strategies and Data Augmentations in CNN-based DeepFake Video Detection [17.696134665850447]
The accuracy of automated systems for face forgery detection in videos is still quite limited and generally biased toward the dataset used to design and train a specific detection system. In this paper we analyze how different training strategies and data augmentation techniques affect CNN-based deepfake detectors when training and testing on the same dataset or across different datasets.
arXiv Detail & Related papers (2020-11-16T08:50:56Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos [5.414308305392762]
We propose a novel scheme for human action recognition in videos, using a 3-dimensional Convolutional Neural Network (3D CNN) based classifier. In this paper, a 3D CNN architecture is proposed to extract featuresweighted and follows Long Short-Term Memory (LSTM) to recognize human actions. Experiments are performed with KTH and WEIZMANN human actions datasets, whereby it is shown to produce comparable results with the state-of-the-art techniques.
arXiv Detail & Related papers (2020-02-06T05:07:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.