Video Vision Transformers for Violence Detection
- URL: http://arxiv.org/abs/2209.03561v1
- Date: Thu, 8 Sep 2022 04:44:01 GMT
- Title: Video Vision Transformers for Violence Detection
- Authors: Sanskar Singh, Shivaibhav Dewangan, Ghanta Sai Krishna, Vandit Tyagi,
Sainath Reddy
- Abstract summary: The proposed solution uses a novel end-to-end deep learning-based video vision transformer (ViViT) that can proficiently discern fights, hostile movements, and violent events in video sequences.
The evaluated results can be subsequently sent to local concerned authority, and the captured video can be analyzed.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Law enforcement and city safety are significantly impacted by detecting
violent incidents in surveillance systems. Although modern (smart) cameras are
widely available and affordable, such technological solutions are impotent in
most instances. Furthermore, personnel monitoring CCTV recordings frequently
show a belated reaction, resulting in the potential cause of catastrophe to
people and property. Thus automated detection of violence for swift actions is
very crucial. The proposed solution uses a novel end-to-end deep learning-based
video vision transformer (ViViT) that can proficiently discern fights, hostile
movements, and violent events in video sequences. The study presents utilizing
a data augmentation strategy to overcome the downside of weaker inductive
biasness while training vision transformers on a smaller training datasets. The
evaluated results can be subsequently sent to local concerned authority, and
the captured video can be analyzed. In comparison to state-of-theart (SOTA)
approaches the proposed method achieved auspicious performance on some of the
challenging benchmark datasets.
Related papers
- Analysis of Unstructured High-Density Crowded Scenes for Crowd Monitoring [55.2480439325792]
We are interested in developing an automated system for detection of organized movements in human crowds.
Computer vision algorithms can extract information from videos of crowded scenes.
We can estimate the number of participants in an organized cohort.
arXiv Detail & Related papers (2024-08-06T22:09:50Z) - JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos [4.94659999696881]
Violence detection in surveillance videos presents additional issues, such as the wide variety of real fight scenes.
We introduce JOSENet, a self-supervised framework that provides outstanding performance for violence detection in surveillance videos.
arXiv Detail & Related papers (2024-05-05T15:01:00Z) - A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised
Traffic Accident Detection in Driving Videos [22.553356096143734]
We propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos.
Our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames.
arXiv Detail & Related papers (2023-07-27T01:45:13Z) - NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake
Detection [50.33525966541906]
Existing multimodal detection methods capture audio-visual inconsistencies to expose Deepfake videos.
We propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics.
Our model can be easily adapted to the downstream Deepfake datasets with fine-tuning.
arXiv Detail & Related papers (2023-06-12T06:06:05Z) - CCTV-Gun: Benchmarking Handgun Detection in CCTV Images [59.24281591714385]
Gun violence is a critical security problem, and it is imperative for the computer vision community to develop effective gun detection algorithms.
detecting guns in real-world CCTV images remains a challenging and under-explored task.
We present a benchmark, called textbfCCTV-Gun, which addresses the challenges of detecting handguns in real-world CCTV images.
arXiv Detail & Related papers (2023-03-19T16:17:35Z) - Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously.
Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z) - SSIVD-Net: A Novel Salient Super Image Classification & Detection
Technique for Weaponized Violence [3.651114792588495]
Detection of violence and weaponized violence in CCTV footage requires a comprehensive approach.
We introduce the emphSmart-City CCTV Violence Detection (SCVD) dataset.
We propose a novel technique called emphSSIVD-Net (textbfSalient-textbfSuper-textbfImage for textbfViolence textbfDetection)
arXiv Detail & Related papers (2022-07-26T12:31:01Z) - E^2TAD: An Energy-Efficient Tracking-based Action Detector [78.90585878925545]
This paper presents a tracking-based solution to accurately and efficiently localize predefined key actions.
It won first place in the UAV-Video Track of 2021 Low-Power Computer Vision Challenge (LPCVC)
arXiv Detail & Related papers (2022-04-09T07:52:11Z) - Video Violence Recognition and Localization using a Semi-Supervised
Hard-Attention Model [0.0]
Violence monitoring and surveillance systems could keep communities safe and save lives.
The current state-of-the-art deep learning approaches to video violence recognition to higher levels of accuracy and performance could enable surveillance systems to be more reliable and scalable.
The main contribution of the proposed deep reinforcement learning method is to achieve state-of-the-art accuracy on RWF, Hockey, and Movies datasets.
arXiv Detail & Related papers (2022-02-04T16:15:26Z) - Real Time Action Recognition from Video Footage [0.5219568203653523]
Video surveillance cameras have added a new dimension to detect crime.
This research focuses on integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities.
arXiv Detail & Related papers (2021-12-13T07:27:41Z) - Training-free Monocular 3D Event Detection System for Traffic
Surveillance [93.65240041833319]
Existing event detection systems are mostly learning-based and have achieved convincing performance when a large amount of training data is available.
In real-world scenarios, collecting sufficient labeled training data is expensive and sometimes impossible.
We propose a training-free monocular 3D event detection system for traffic surveillance.
arXiv Detail & Related papers (2020-02-01T04:42:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.