Deep Camera Pose Regression Using Pseudo-LiDAR
- URL: http://arxiv.org/abs/2203.00080v1
- Date: Mon, 28 Feb 2022 20:30:37 GMT
- Title: Deep Camera Pose Regression Using Pseudo-LiDAR
- Authors: Ali Raza, Lazar Lolic, Shahmir Akhter, Alfonso Dela Cruz, Michael Liut
- Abstract summary: We show that converting depth maps into pseudo-LiDAR signals is a better representation for camera localization tasks.
We propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose.
- Score: 1.5959408994101303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An accurate and robust large-scale localization system is an integral
component for active areas of research such as autonomous vehicles and
augmented reality. To this end, many learning algorithms have been proposed
that predict 6DOF camera pose from RGB or RGB-D images. However, previous
methods that incorporate depth typically treat the data the same way as RGB
images, often adding depth maps as additional channels to RGB images and
passing them through convolutional neural networks (CNNs). In this paper, we
show that converting depth maps into pseudo-LiDAR signals, previously shown to
be useful for 3D object detection, is a better representation for camera
localization tasks by projecting point clouds that can accurately determine
6DOF camera pose. This is demonstrated by first comparing localization
accuracies of a network operating exclusively on pseudo-LiDAR representations,
with networks operating exclusively on depth maps. We then propose FusionLoc, a
novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose.
FusionLoc is a dual stream neural network, which aims to remedy common issues
with typical 2D CNNs operating on RGB-D images. The results from this
architecture are compared against various other state-of-the-art deep pose
regression implementations using the 7 Scenes dataset. The findings are that
FusionLoc performs better than a number of other camera localization methods,
with a notable improvement being, on average, 0.33m and 4.35{\deg} more
accurate than RGB-D PoseNet. By proving the validity of using pseudo-LiDAR
signals over depth maps for localization, there are new considerations when
implementing large-scale localization systems.
Related papers
- Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians [87.48403838439391]
3D Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous SLAM.
We propose the first RGB-only SLAM system with a dense 3D Gaussian map representation.
Our experiments on the Replica, TUM-RGBD, and ScanNet datasets indicate the effectiveness of globally optimized 3D Gaussians.
arXiv Detail & Related papers (2024-05-26T12:26:54Z) - ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera [9.212504138203222]
We propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera.
Our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction.
Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping.
arXiv Detail & Related papers (2024-05-09T09:44:51Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Fast and Lightweight Scene Regressor for Camera Relocalization [1.6708069984516967]
Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
arXiv Detail & Related papers (2022-12-04T14:41:20Z) - Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose
Estimation [21.424035166174352]
State-of-the-art approaches typically use different backbones to extract features for RGB and depth images.
We find that the essential reason for using two independent backbones is the "projection breakdown" problem.
We propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input.
arXiv Detail & Related papers (2022-03-28T07:05:27Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Sparse Depth Completion with Semantic Mesh Deformation Optimization [4.03103540543081]
We propose a neural network with post-optimization, which takes an RGB image and sparse depth samples as input and predicts the complete depth map.
Our evaluation results outperform the existing work consistently on both indoor and outdoor datasets.
arXiv Detail & Related papers (2021-12-10T13:01:06Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation [54.666329929930455]
We present FFB6D, a Bidirectional fusion network designed for 6D pose estimation from a single RGBD image.
We learn to combine appearance and geometry information for representation learning as well as output representation selection.
Our method outperforms the state-of-the-art by large margins on several benchmarks.
arXiv Detail & Related papers (2021-03-03T08:07:29Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - Deep-Geometric 6 DoF Localization from a Single Image in Topo-metric
Maps [39.05304338751328]
We describe a Deep-Geometric Localizer that is able to estimate the full 6 Degree of Freedom (DoF) global pose of the camera from a single image.
Our method divorces the mapping and the localization algorithms (stereo and mono) and allows accurate 6 DoF pose estimation in a previously mapped environment.
With potential VR/AR and localization applications in single camera devices such as mobile phones and drones, our hybrid algorithm compares favourably with the fully Deep-Learning based Pose-Net.
arXiv Detail & Related papers (2020-02-04T10:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.