YOLOpeds: Efficient Real-Time Single-Shot Pedestrian Detection for Smart
Camera Applications
- URL: http://arxiv.org/abs/2007.13404v2
- Date: Thu, 29 Oct 2020 16:23:12 GMT
- Title: YOLOpeds: Efficient Real-Time Single-Shot Pedestrian Detection for Smart
Camera Applications
- Authors: Christos Kyrkou
- Abstract summary: This work addresses the challenge of achieving a good trade-off between accuracy and speed for efficient deployment of deep-learning-based pedestrian detection in smart camera applications.
A computationally efficient architecture is introduced based on separable convolutions and proposes integrating dense connections across layers and multi-scale feature fusion.
Overall, YOLOpeds provides real-time sustained operation of over 30 frames per second with detection rates in the range of 86% outperforming existing deep learning models.
- Score: 2.588973722689844
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep Learning-based object detectors can enhance the capabilities of smart
camera systems in a wide spectrum of machine vision applications including
video surveillance, autonomous driving, robots and drones, smart factory, and
health monitoring. Pedestrian detection plays a key role in all these
applications and deep learning can be used to construct accurate
state-of-the-art detectors. However, such complex paradigms do not scale easily
and are not traditionally implemented in resource-constrained smart cameras for
on-device processing which offers significant advantages in situations when
real-time monitoring and robustness are vital. Efficient neural networks can
not only enable mobile applications and on-device experiences but can also be a
key enabler of privacy and security allowing a user to gain the benefits of
neural networks without needing to send their data to the server to be
evaluated. This work addresses the challenge of achieving a good trade-off
between accuracy and speed for efficient deployment of deep-learning-based
pedestrian detection in smart camera applications. A computationally efficient
architecture is introduced based on separable convolutions and proposes
integrating dense connections across layers and multi-scale feature fusion to
improve representational capacity while decreasing the number of parameters and
operations. In particular, the contributions of this work are the following: 1)
An efficient backbone combining multi-scale feature operations, 2) a more
elaborate loss function for improved localization, 3) an anchor-less approach
for detection, The proposed approach called YOLOpeds is evaluated using the
PETS2009 surveillance dataset on 320x320 images. Overall, YOLOpeds provides
real-time sustained operation of over 30 frames per second with detection rates
in the range of 86% outperforming existing deep learning models.
Related papers
- Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors [6.166992288822812]
Multi-Object Tracking plays a critical role in ensuring safer and more efficient navigation through complex traffic scenarios.
This paper presents a novel deep learning-based method that integrates radar and camera data to enhance the accuracy and robustness of Multi-Object Tracking in autonomous driving systems.
arXiv Detail & Related papers (2024-07-10T21:09:09Z) - PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search [64.28335667655129]
Multiple object tracking is a critical task in autonomous driving.
As tracking accuracy improves, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency.
In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy.
arXiv Detail & Related papers (2024-03-23T04:18:49Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - Lightweight Delivery Detection on Doorbell Cameras [9.735137325682825]
In this work we investigate an important home application, video based delivery detection, and present a simple lightweight pipeline for this task.
Our method relies on motionconstrained to generate a set of coarse activity cues followed by their classification with a mobile-friendly 3DCNN network.
arXiv Detail & Related papers (2023-05-13T01:28:28Z) - Agile gesture recognition for capacitive sensing devices: adapting
on-the-job [55.40855017016652]
We demonstrate a hand gesture recognition system that uses signals from capacitive sensors embedded into the etee hand controller.
The controller generates real-time signals from each of the wearer five fingers.
We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms.
arXiv Detail & Related papers (2023-05-12T17:24:02Z) - Baby Physical Safety Monitoring in Smart Home Using Action Recognition
System [0.0]
We present a novel framework combining transfer learning techniques with a Conv2D LSTM layer to extract features from the pre-trained I3D model on the Kinetics dataset.
We developed a benchmark dataset and an automated model that uses LSTM convolution with I3D (ConvLSTM-I3D) for recognizing and predicting baby activities in a smart baby room.
arXiv Detail & Related papers (2022-10-22T19:00:14Z) - A Wireless-Vision Dataset for Privacy Preserving Human Activity
Recognition [53.41825941088989]
A new WiFi-based and video-based neural network (WiNN) is proposed to improve the robustness of activity recognition.
Our results show that WiVi data set satisfies the primary demand and all three branches in the proposed pipeline keep more than $80%$ of activity recognition accuracy.
arXiv Detail & Related papers (2022-05-24T10:49:11Z) - Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device [53.323878851563414]
We propose a compiler-aware unified framework incorporating network enhancement and pruning search with the reinforcement learning techniques.
Specifically, a generator Recurrent Neural Network (RNN) is employed to provide the unified scheme for both network enhancement and pruning search automatically.
The proposed framework achieves real-time 3D object detection on mobile devices with competitive detection performance.
arXiv Detail & Related papers (2020-12-26T19:41:15Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - A Markerless Deep Learning-based 6 Degrees of Freedom PoseEstimation for
with Mobile Robots using RGB Data [3.4806267677524896]
We propose a method to deploy state of the art neural networks for real time 3D object localization on augmented reality devices.
We focus on fast 2D detection approaches which are extracting the 3D pose of the object fast and accurately by using only 2D input.
For the 6D annotation of 2D images, we developed an annotation tool, which is, to our knowledge, the first open source tool to be available.
arXiv Detail & Related papers (2020-01-16T09:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.