Multi-camera Torso Pose Estimation using Graph Neural Networks
- URL: http://arxiv.org/abs/2007.14126v1
- Date: Tue, 28 Jul 2020 11:14:02 GMT
- Title: Multi-camera Torso Pose Estimation using Graph Neural Networks
- Authors: Daniel Rodriguez-Criado, Pilar Bachiller, Pablo Bustos, George
Vogiatzis, Luis J. Manso
- Abstract summary: Estimating the location and orientation of humans is an essential skill for service and assistive robots.
The proposal presented in this paper makes use of graph neural networks to merge the information acquired from multiple camera sources.
The experiments, conducted in an apartment with three cameras, benchmarked two different graph neural network implementations and a third architecture.
- Score: 3.7431113857875746
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Estimating the location and orientation of humans is an essential skill for
service and assistive robots. To achieve a reliable estimation in a wide area
such as an apartment, multiple RGBD cameras are frequently used. Firstly, these
setups are relatively expensive. Secondly, they seldom perform an effective
data fusion using the multiple camera sources at an early stage of the
processing pipeline. Occlusions and partial views make this second point very
relevant in these scenarios. The proposal presented in this paper makes use of
graph neural networks to merge the information acquired from multiple camera
sources, achieving a mean absolute error below 125 mm for the location and 10
degrees for the orientation using low-resolution RGB images. The experiments,
conducted in an apartment with three cameras, benchmarked two different graph
neural network implementations and a third architecture based on fully
connected layers. The software used has been released as open-source in a
public repository (https://github.com/vangiel/WheresTheFellow).
Related papers
- Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D
Images [58.720142291102135]
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment.
The dataset is based on the popular Habitat simulator, in which it is possible to generate indoor scenes using both own sensor data and open datasets.
arXiv Detail & Related papers (2022-12-30T12:20:56Z) - Fast and Lightweight Scene Regressor for Camera Relocalization [1.6708069984516967]
Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
arXiv Detail & Related papers (2022-12-04T14:41:20Z) - Deep Camera Pose Regression Using Pseudo-LiDAR [1.5959408994101303]
We show that converting depth maps into pseudo-LiDAR signals is a better representation for camera localization tasks.
We propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose.
arXiv Detail & Related papers (2022-02-28T20:30:37Z) - Graph Neural Networks for Cross-Camera Data Association [3.490148531239259]
Cross-camera image data association is essential for many multi-camera computer vision tasks.
This paper proposes an efficient approach for cross-cameras data-association focused on a global solution.
arXiv Detail & Related papers (2022-01-17T09:52:39Z) - Unsupervised Depth Completion with Calibrated Backprojection Layers [79.35651668390496]
We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud.
It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera.
At inference time, the calibration of the camera, which can be different from the one used for training, is fed as an input to the network along with the sparse point cloud and a single image.
arXiv Detail & Related papers (2021-08-24T05:41:59Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z) - On Localizing a Camera from a Single Image [9.049593493956008]
We show that it is possible to estimate the location of a camera from a single image taken by the camera.
We show that, using a judicious combination of projective geometry, neural networks, and crowd-sourced annotations from human workers, it is possible to position 95% of the images in our test data set to within 12 m.
arXiv Detail & Related papers (2020-03-24T05:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.