Fast and Lightweight Scene Regressor for Camera Relocalization
- URL: http://arxiv.org/abs/2212.01830v1
- Date: Sun, 4 Dec 2022 14:41:20 GMT
- Title: Fast and Lightweight Scene Regressor for Camera Relocalization
- Authors: Thuan B. Bui, Dinh-Tuan Tran, and Joo-Ho Lee
- Abstract summary: Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
- Score: 1.6708069984516967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Camera relocalization involving a prior 3D reconstruction plays a crucial
role in many mixed reality and robotics applications. Estimating the camera
pose directly with respect to pre-built 3D models can be prohibitively
expensive for several applications with limited storage and/or communication
bandwidth. Although recent scene and absolute pose regression methods have
become popular for efficient camera localization, most of them are
computation-resource intensive and difficult to obtain a real-time inference
with high accuracy constraints. This study proposes a simple scene regression
method that requires only a multi-layer perceptron network for mapping scene
coordinates to achieve accurate camera pose estimations. The proposed approach
uses sparse descriptors to regress the scene coordinates, instead of a dense
RGB image. The use of sparse features provides several advantages. First, the
proposed regressor network is substantially smaller than those reported in
previous studies. This makes our system highly efficient and scalable. Second,
the pre-built 3D models provide the most reliable and robust 2D-3D matches.
Therefore, learning from them can lead to an awareness of equivalent features
and substantially improve the generalization performance. A detailed analysis
of our approach and extensive evaluations using existing datasets are provided
to support the proposed method. The implementation detail is available at
https://github.com/aislab/feat2map
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving.
It predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
It is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z) - Lazy Visual Localization via Motion Averaging [89.8709956317671]
We show that it is possible to achieve high localization accuracy without reconstructing the scene from the database.
Experiments show that our visual localization proposal, LazyLoc, achieves comparable performance against state-of-the-art structure-based methods.
arXiv Detail & Related papers (2023-07-19T13:40:45Z) - SparsePose: Sparse-View Camera Pose Regression and Refinement [32.74890928398753]
We propose SparsePose for recovering accurate camera poses given a sparse set of wide-baseline images (fewer than 10)
The method learns to regress initial camera poses and then iteratively refine them after training on a large-scale dataset of objects.
We also demonstrate our pipeline for high-fidelity 3D reconstruction using only 5-9 images of an object.
arXiv Detail & Related papers (2022-11-29T05:16:07Z) - Deep Camera Pose Regression Using Pseudo-LiDAR [1.5959408994101303]
We show that converting depth maps into pseudo-LiDAR signals is a better representation for camera localization tasks.
We propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose.
arXiv Detail & Related papers (2022-02-28T20:30:37Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.