TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural
Networks for Real-Time Handgun Detection in Video
- URL: http://arxiv.org/abs/2111.08867v2
- Date: Fri, 19 Nov 2021 03:49:25 GMT
- Title: TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural
Networks for Real-Time Handgun Detection in Video
- Authors: Mario Alberto Duran-Vega, Miguel Gonzalez-Mendoza, Leonardo Chang,
Cuauhtemoc Daniel Suarez-Ramirez
- Abstract summary: Timely handgun detection is a crucial problem to improve public safety.
Much of the previous research on handgun detection is based on static image detectors.
To improve the performance of surveillance systems, a real-time temporal handgun detection system should be built.
- Score: 0.5735035463793008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Timely handgun detection is a crucial problem to improve public safety;
nevertheless, the effectiveness of many surveillance systems still depends of
finite human attention. Much of the previous research on handgun detection is
based on static image detectors, leaving aside valuable temporal information
that could be used to improve object detection in videos. To improve the
performance of surveillance systems, a real-time temporal handgun detection
system should be built. Using Temporal Yolov5, an architecture based on
Quasi-Recurrent Neural Networks, temporal information is extracted from video
to improve the results of handgun detection. Moreover, two publicly available
datasets are proposed, labeled with hands, guns, and phones. One containing
2199 static images to train static detectors, and another with 5960 frames of
videos to train temporal modules. Additionally, we explore two temporal data
augmentation techniques based on Mosaic and Mixup. The resulting systems are
three temporal architectures: one focused in reducing inference with a
mAP$_{50:95}$ of 55.9, another in having a good balance between inference and
accuracy with a mAP$_{50:95}$ of 59, and a last one specialized in accuracy
with a mAP$_{50:95}$ of 60.2. Temporal Yolov5 achieves real-time detection in
the small and medium architectures. Moreover, it takes advantage of temporal
features contained in videos to perform better than Yolov5 in our temporal
dataset, making TYolov5 suitable for real-world applications. The source code
is publicly available at https://github.com/MarioDuran/TYolov5.
Related papers
- Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - CCTV-Gun: Benchmarking Handgun Detection in CCTV Images [59.24281591714385]
Gun violence is a critical security problem, and it is imperative for the computer vision community to develop effective gun detection algorithms.
detecting guns in real-world CCTV images remains a challenging and under-explored task.
We present a benchmark, called textbfCCTV-Gun, which addresses the challenges of detecting handguns in real-world CCTV images.
arXiv Detail & Related papers (2023-03-19T16:17:35Z) - Real-Time Driver Monitoring Systems through Modality and View Analysis [28.18784311981388]
Driver distractions are known to be the dominant cause of road accidents.
State-of-the-art methods prioritize accuracy while ignoring latency.
We propose time-effective detection models by neglecting the temporal relation between video frames.
arXiv Detail & Related papers (2022-10-17T21:22:41Z) - Real Time Action Recognition from Video Footage [0.5219568203653523]
Video surveillance cameras have added a new dimension to detect crime.
This research focuses on integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities.
arXiv Detail & Related papers (2021-12-13T07:27:41Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction
Detection in Videos [91.29436920371003]
We propose a simple yet effective architecture named Spatial-Temporal HOI Detection (ST-HOI)
We use temporal information such as human and object trajectories, correctly-localized visual features, and spatial-temporal masking pose features.
We construct a new video HOI benchmark dubbed VidHOI where our proposed approach serves as a solid baseline.
arXiv Detail & Related papers (2021-05-25T07:54:35Z) - MULTICAST: MULTI Confirmation-level Alarm SysTem based on CNN and LSTM
to mitigate false alarms for handgun detection in video-surveillance [11.626928736124038]
Multi Confirmation-level Alarm SysTem based on CNN and Long Short Term Memory networks (LSTM) (MULTICAST)
Our experiments show that MULTICAST reduces by 80% the number of false alarms with respect to Faster R-CNN based-single-image detector.
arXiv Detail & Related papers (2021-04-23T15:07:58Z) - ACDnet: An action detection network for real-time edge computing based
on flow-guided feature approximation and memory aggregation [8.013823319651395]
ACDnet is a compact action detection network targeting real-time edge computing.
It exploits the temporal coherence between successive video frames to approximate CNN features rather than naively extracting them.
It can robustly achieve detection well above real-time (75 FPS)
arXiv Detail & Related papers (2021-02-26T14:06:31Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - An Analysis of Deep Object Detectors For Diver Detection [19.14344722263869]
We produce a dataset of approximately 105,000 annotated images of divers sourced from videos.
We train a variety of state-of-the-art deep neural networks for object detection, including SSD with Mobilenet, Faster R-CNN, and YOLO.
Based on our results, we recommend Tiny-YOLOv4 for real-time applications on robots.
arXiv Detail & Related papers (2020-11-25T01:50:32Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.