Fast and Lightweight Scene Regressor for Camera Relocalization
- URL: http://arxiv.org/abs/2212.01830v1
- Date: Sun, 4 Dec 2022 14:41:20 GMT
- Title: Fast and Lightweight Scene Regressor for Camera Relocalization
- Authors: Thuan B. Bui, Dinh-Tuan Tran, and Joo-Ho Lee
- Abstract summary: Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
- Score: 1.6708069984516967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Camera relocalization involving a prior 3D reconstruction plays a crucial
role in many mixed reality and robotics applications. Estimating the camera
pose directly with respect to pre-built 3D models can be prohibitively
expensive for several applications with limited storage and/or communication
bandwidth. Although recent scene and absolute pose regression methods have
become popular for efficient camera localization, most of them are
computation-resource intensive and difficult to obtain a real-time inference
with high accuracy constraints. This study proposes a simple scene regression
method that requires only a multi-layer perceptron network for mapping scene
coordinates to achieve accurate camera pose estimations. The proposed approach
uses sparse descriptors to regress the scene coordinates, instead of a dense
RGB image. The use of sparse features provides several advantages. First, the
proposed regressor network is substantially smaller than those reported in
previous studies. This makes our system highly efficient and scalable. Second,
the pre-built 3D models provide the most reliable and robust 2D-3D matches.
Therefore, learning from them can lead to an awareness of equivalent features
and substantially improve the generalization performance. A detailed analysis
of our approach and extensive evaluations using existing datasets are provided
to support the proposed method. The implementation detail is available at
https://github.com/aislab/feat2map
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters.
Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z) - FaVoR: Features via Voxel Rendering for Camera Relocalization [23.7893950095252]
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image.
We propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features.
By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking.
arXiv Detail & Related papers (2024-09-11T18:58:16Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z) - Lazy Visual Localization via Motion Averaging [89.8709956317671]
We show that it is possible to achieve high localization accuracy without reconstructing the scene from the database.
Experiments show that our visual localization proposal, LazyLoc, achieves comparable performance against state-of-the-art structure-based methods.
arXiv Detail & Related papers (2023-07-19T13:40:45Z) - Deep Camera Pose Regression Using Pseudo-LiDAR [1.5959408994101303]
We show that converting depth maps into pseudo-LiDAR signals is a better representation for camera localization tasks.
We propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose.
arXiv Detail & Related papers (2022-02-28T20:30:37Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.