FaVoR: Features via Voxel Rendering for Camera Relocalization
- URL: http://arxiv.org/abs/2409.07571v1
- Date: Wed, 11 Sep 2024 18:58:16 GMT
- Title: FaVoR: Features via Voxel Rendering for Camera Relocalization
- Authors: Vincenzo Polizzi, Marco Cannici, Davide Scaramuzza, Jonathan Kelly,
- Abstract summary: Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image.
We propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features.
By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking.
- Score: 23.7893950095252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer [21.832249148699397]
We address the task of estimating camera parameters from a set of images depicting a scene.
We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images.
arXiv Detail & Related papers (2024-04-22T17:02:33Z) - SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - Multi-View Object Pose Refinement With Differentiable Renderer [22.040014384283378]
This paper introduces a novel multi-view 6 DoF object pose refinement approach focusing on improving methods trained on synthetic data.
It is based on the DPOD detector, which produces dense 2D-3D correspondences between the model vertices and the image pixels in each frame.
We report excellent performance in comparison to the state-of-the-art methods trained on the synthetic and real data.
arXiv Detail & Related papers (2022-07-06T17:02:22Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Pixel-Perfect Structure-from-Motion with Featuremetric Refinement [96.73365545609191]
We refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views.
This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors.
Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
arXiv Detail & Related papers (2021-08-18T17:58:55Z) - Reassessing the Limitations of CNN Methods for Camera Pose Regression [27.86655424544118]
We propose a model that can directly regress the camera pose from images with significantly higher accuracy than existing methods of the same class.
We first analyse why regression methods are still behind the state-of-the-art, and we bridge the performance gap with our new approach.
arXiv Detail & Related papers (2021-08-16T17:55:26Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - Multi-View Optimization of Local Feature Geometry [70.18863787469805]
We address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry.
Our proposed method naturally complements the traditional feature extraction and matching paradigm.
We show that our method consistently improves the triangulation and camera localization performance for both hand-crafted and learned local features.
arXiv Detail & Related papers (2020-03-18T17:22:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.