Towards Dense People Detection with Deep Learning and Depth images
- URL: http://arxiv.org/abs/2007.07171v1
- Date: Tue, 14 Jul 2020 16:43:02 GMT
- Title: Towards Dense People Detection with Deep Learning and Depth images
- Authors: David Fuentes-Jimenez and Cristina Losada-Gutierrez and David
Casillas-Perez and Javier Macias-Guarasa and Roberto Martin-Lopez and Daniel
Pizarro and Carlos A.Luna
- Abstract summary: This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
- Score: 9.376814409561726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a DNN-based system that detects multiple people from a
single depth image. Our neural network processes a depth image and outputs a
likelihood map in image coordinates, where each detection corresponds to a
Gaussian-shaped local distribution, centered at the person's head. The
likelihood map encodes both the number of detected people and their 2D image
positions, and can be used to recover the 3D position of each person using the
depth image and the camera calibration parameters. Our architecture is compact,
using separated convolutions to increase performance, and runs in real-time
with low budget GPUs. We use simulated data for initially training the network,
followed by fine tuning with a relatively small amount of real data. We show
this strategy to be effective, producing networks that generalize to work with
scenes different from those used during training. We thoroughly compare our
method against the existing state-of-the-art, including both classical and
DNN-based solutions. Our method outperforms existing methods and can accurately
detect people in scenes with significant occlusions.
Related papers
- Understanding Depth Map Progressively: Adaptive Distance Interval
Separation for Monocular 3d Object Detection [38.96129204108353]
Several monocular 3D detection techniques rely on auxiliary depth maps from the depth estimation task.
We propose a framework named the Adaptive Distance Interval Separation Network (ADISN) that adopts a novel perspective on understanding depth maps.
arXiv Detail & Related papers (2023-06-19T13:32:53Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - GCNDepth: Self-supervised Monocular Depth Estimation based on Graph
Convolutional Network [11.332580333969302]
This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps.
A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure.
Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
arXiv Detail & Related papers (2021-12-13T16:46:25Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - DPDnet: A Robust People Detector using Deep Learning with an Overhead
Depth Camera [9.376814409561726]
We propose a method that detects multiple people from a single overhead depth image with high reliability.
Our neural network, called DPDnet, is based on two fully-convolutional encoder-decoder neural blocks based on residual layers.
The experimental work shows that DPDNet outperforms state-of-the-art methods, with accuracies greater than 99% in three different publicly available datasets.
arXiv Detail & Related papers (2020-06-01T16:28:25Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - DELTAS: Depth Estimation by Learning Triangulation And densification of
Sparse points [14.254472131009653]
Multi-view stereo (MVS) is the golden mean between the accuracy of active depth sensing and the practicality of monocular depth estimation.
Cost volume based approaches employing 3D convolutional neural networks (CNNs) have considerably improved the accuracy of MVS systems.
We propose an efficient depth estimation approach by first (a) detecting and evaluating descriptors for interest points, then (b) learning to match and triangulate a small set of interest points, and finally (c) densifying this sparse set of 3D points using CNNs.
arXiv Detail & Related papers (2020-03-19T17:56:41Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.