How semantic and geometric information mutually reinforce each other in
ToF object localization
- URL: http://arxiv.org/abs/2008.12002v1
- Date: Thu, 27 Aug 2020 09:13:26 GMT
- Title: How semantic and geometric information mutually reinforce each other in
ToF object localization
- Authors: Antoine Vanderschueren, Victor Joos, Christophe De Vleeschouwer
- Abstract summary: We propose a novel approach to localize a 3D object from the intensity and depth information images provided by a Time-of-Flight (ToF) sensor.
Our proposed two-step approach improves segmentation and localization accuracy by a significant margin compared to a conventional CNN architecture.
- Score: 19.47618043504105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel approach to localize a 3D object from the intensity and
depth information images provided by a Time-of-Flight (ToF) sensor. Our method
uses two CNNs. The first one uses raw depth and intensity images as input, to
segment the floor pixels, from which the extrinsic parameters of the camera are
estimated. The second CNN is in charge of segmenting the object-of-interest. As
a main innovation, it exploits the calibration estimated from the prediction of
the first CNN to represent the geometric depth information in a coordinate
system that is attached to the ground, and is thus independent of the camera
elevation. In practice, both the height of pixels with respect to the ground,
and the orientation of normals to the point cloud are provided as input to the
second CNN. Given the segmentation predicted by the second CNN, the object is
localized based on point cloud alignment with a reference model. Our
experiments demonstrate that our proposed two-step approach improves
segmentation and localization accuracy by a significant margin compared to a
conventional CNN architecture, ignoring calibration and height maps, but also
compared to PointNet++.
Related papers
- An evaluation of CNN models and data augmentation techniques in hierarchical localization of mobile robots [0.0]
This work presents an evaluation of CNN models and data augmentation to carry out the hierarchical localization of a mobile robot.
In this sense, an ablation study of different state-of-the-art CNN models used as backbone is presented.
A variety of data augmentation visual effects are proposed for addressing the visual localization of the robot.
arXiv Detail & Related papers (2024-07-15T10:20:00Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - Random Padding Data Augmentation [23.70951896315126]
convolutional neural network (CNN) learns the same object in different positions in images.
The usefulness of the features' spatial information in CNNs has not been well investigated.
We introduce Random Padding, a new type of padding method for training CNNs.
arXiv Detail & Related papers (2023-02-17T04:15:33Z) - Geometry-Aware Network for Domain Adaptive Semantic Segmentation [64.00345743710653]
We propose a novel Geometry-Aware Network for Domain Adaptation (GANDA) to shrink the domain gaps.
We exploit 3D topology on the point clouds generated from RGB-D images for coordinate-color disentanglement and pseudo-labels refinement in the target domain.
Our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.
arXiv Detail & Related papers (2022-12-02T00:48:44Z) - Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose
Estimation [21.424035166174352]
State-of-the-art approaches typically use different backbones to extract features for RGB and depth images.
We find that the essential reason for using two independent backbones is the "projection breakdown" problem.
We propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input.
arXiv Detail & Related papers (2022-03-28T07:05:27Z) - GCNDepth: Self-supervised Monocular Depth Estimation based on Graph
Convolutional Network [11.332580333969302]
This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps.
A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure.
Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
arXiv Detail & Related papers (2021-12-13T16:46:25Z) - Keypoint Message Passing for Video-based Person Re-Identification [106.41022426556776]
Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras.
Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement.
In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
arXiv Detail & Related papers (2021-11-16T08:01:16Z) - Category-Level Metric Scale Object Shape and Pose Estimation [73.92460712829188]
We propose a framework that jointly estimates a metric scale shape and pose from a single RGB image.
We validated our method on both synthetic and real-world datasets to evaluate category-level object pose and shape.
arXiv Detail & Related papers (2021-09-01T12:16:46Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective
Crop Layers [111.55817466296402]
We introduce Perspective Crop Layers (PCLs) - a form of perspective crop of the region of interest based on the camera geometry.
PCLs deterministically remove the location-dependent perspective effects while leaving end-to-end training and the number of parameters of the underlying neural network.
PCL offers an easy way to improve the accuracy of existing 3D reconstruction networks by making them geometry aware.
arXiv Detail & Related papers (2020-11-27T08:48:43Z) - Depth-Adapted CNN for RGB-D cameras [0.3727773051465455]
Conventional 2D Convolutional Neural Networks (CNN) extract features from an input image by applying linear filters.
We tackle the problem of improving the classical RGB CNN methods by using the depth information provided by the RGB-D cameras.
This paper proposes a novel and generic procedure to articulate both photometric and geometric information in CNN architecture.
arXiv Detail & Related papers (2020-09-21T15:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.