Streamlining Video Analysis for Efficient Violence Detection
- URL: http://arxiv.org/abs/2412.02127v1
- Date: Fri, 29 Nov 2024 06:32:36 GMT
- Title: Streamlining Video Analysis for Efficient Violence Detection
- Authors: Gourang Pathak, Abhay Kumar, Sannidhya Rawat, Shikha Gupta,
- Abstract summary: This paper addresses the challenge of automated violence detection in video frames captured by surveillance cameras.
We propose an approach using a 3D Convolutional Neural Network (3D CNN)-based model named X3D to tackle this problem.
- Score: 1.444946491007292
- License:
- Abstract: This paper addresses the challenge of automated violence detection in video frames captured by surveillance cameras, specifically focusing on classifying scenes as "fight" or "non-fight." This task is critical for enhancing unmanned security systems, online content filtering, and related applications. We propose an approach using a 3D Convolutional Neural Network (3D CNN)-based model named X3D to tackle this problem. Our approach incorporates pre-processing steps such as tube extraction, volume cropping, and frame aggregation, combined with clustering techniques, to accurately localize and classify fight scenes. Extensive experimentation demonstrates the effectiveness of our method in distinguishing violent from non-violent events, providing valuable insights for advancing practical violence detection systems.
Related papers
- DIFEM: Key-points Interaction based Feature Extraction Module for Violence Recognition in Videos [6.375350222633163]
We propose an effective method which leverages human skeleton key-points to capture inherent properties of violence.
At the heart of our method is our novel Dynamic Interaction Feature Extraction Module (DIFEM) which captures features such as velocity, and joint intersections.
With the features extracted by our DIFEM, we use various classification algorithms such as Random Forest, Decision tree, AdaBoost and k-Nearest Neighbor.
arXiv Detail & Related papers (2024-12-06T19:25:47Z) - Analysis of Unstructured High-Density Crowded Scenes for Crowd Monitoring [55.2480439325792]
We are interested in developing an automated system for detection of organized movements in human crowds.
Computer vision algorithms can extract information from videos of crowded scenes.
We can estimate the number of participants in an organized cohort.
arXiv Detail & Related papers (2024-08-06T22:09:50Z) - JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos [4.94659999696881]
Violence detection in surveillance videos presents additional issues, such as the wide variety of real fight scenes.
We introduce JOSENet, a self-supervised framework that provides outstanding performance for violence detection in surveillance videos.
arXiv Detail & Related papers (2024-05-05T15:01:00Z) - AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware
Robust Adversarial Training [64.14759275211115]
We propose a depth-aware robust adversarial training method for monocular 3D object detection, dubbed DART3D.
Our adversarial training approach capitalizes on the inherent uncertainty, enabling the model to significantly improve its robustness against adversarial attacks.
arXiv Detail & Related papers (2023-09-03T07:05:32Z) - CCTV-Gun: Benchmarking Handgun Detection in CCTV Images [59.24281591714385]
Gun violence is a critical security problem, and it is imperative for the computer vision community to develop effective gun detection algorithms.
detecting guns in real-world CCTV images remains a challenging and under-explored task.
We present a benchmark, called textbfCCTV-Gun, which addresses the challenges of detecting handguns in real-world CCTV images.
arXiv Detail & Related papers (2023-03-19T16:17:35Z) - SSIVD-Net: A Novel Salient Super Image Classification & Detection
Technique for Weaponized Violence [3.651114792588495]
Detection of violence and weaponized violence in CCTV footage requires a comprehensive approach.
We introduce the emphSmart-City CCTV Violence Detection (SCVD) dataset.
We propose a novel technique called emphSSIVD-Net (textbfSalient-textbfSuper-textbfImage for textbfViolence textbfDetection)
arXiv Detail & Related papers (2022-07-26T12:31:01Z) - Detecting Violence in Video Based on Deep Features Fusion Technique [0.30458514384586394]
This work proposed a novel method to detect violence using a fusion tech-nique of two convolutional neural networks (CNNs)
The performance of the proposed method is evaluated using three standard benchmark datasets in terms of detection accuracy.
arXiv Detail & Related papers (2022-04-15T12:51:20Z) - gradSim: Differentiable simulation for system identification and
visuomotor control [66.37288629125996]
We present gradSim, a framework that overcomes the dependence on 3D supervision by leveraging differentiable multiphysics simulation and differentiable rendering.
Our unified graph enables learning in challenging visuomotor control tasks, without relying on state-based (3D) supervision.
arXiv Detail & Related papers (2021-04-06T16:32:01Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Gabriella: An Online System for Real-Time Activity Detection in
Untrimmed Security Videos [72.50607929306058]
We propose a real-time online system to perform activity detection on untrimmed security videos.
The proposed method consists of three stages: tubelet extraction, activity classification and online tubelet merging.
We demonstrate the effectiveness of the proposed approach in terms of speed (100 fps) and performance with state-of-the-art results.
arXiv Detail & Related papers (2020-04-23T22:20:10Z) - Vision-based Fight Detection from Surveillance Cameras [6.982738885923204]
This paper explores LSTM-based approaches to solve fight scene classification problem.
A new dataset is collected, which consists of fight scenes from surveillance camera videos available at YouTube.
It is observed that the proposed approach, which integrates Xception model, Bi-LSTM, and attention, improves the state-of-the-art accuracy for fight scene classification.
arXiv Detail & Related papers (2020-02-11T12:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.