Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation
- URL: http://arxiv.org/abs/2206.01916v1
- Date: Sat, 4 Jun 2022 06:29:46 GMT
- Title: Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation
- Authors: Gil Avraham, Julian Straub, Tianwei Shen, Tsun-Yi Yang, Hugo Germain,
Chris Sweeney, Vasileios Balntas, David Novotny, Daniel DeTone, Richard
Newcombe
- Abstract summary: Our proposed 3D scene representation, Nerfels, is locally dense yet globally sparse.
We adopt a feature-driven approach for representing scene-agnostic, local 3D patches with renderable codes.
Our model can be incorporated to existing state-of-the-art hand-crafted and learned local feature estimators, yielding improved performance when evaluating on ScanNet for wide camera baseline scenarios.
- Score: 21.111919718001907
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents a framework that combines traditional keypoint-based
camera pose optimization with an invertible neural rendering mechanism. Our
proposed 3D scene representation, Nerfels, is locally dense yet globally
sparse. As opposed to existing invertible neural rendering systems which
overfit a model to the entire scene, we adopt a feature-driven approach for
representing scene-agnostic, local 3D patches with renderable codes. By
modelling a scene only where local features are detected, our framework
effectively generalizes to unseen local regions in the scene via an optimizable
code conditioning mechanism in the neural renderer, all while maintaining the
low memory footprint of a sparse 3D map representation. Our model can be
incorporated to existing state-of-the-art hand-crafted and learned local
feature pose estimators, yielding improved performance when evaluating on
ScanNet for wide camera baseline scenarios.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization [1.4466437171584356]
3D Gaussian Splatting (3DGS) allows for the compact encoding of both 3D geometry and scene appearance with its spatial features.
We propose distilling dense keypoint descriptors into 3DGS to improve the model's spatial understanding.
Our approach surpasses state-of-the-art Neural Render Pose (NRP) methods, including NeRFMatch and PNeRFLoc.
arXiv Detail & Related papers (2024-09-24T23:18:32Z) - SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters.
Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM [51.21564182169607]
Newton is a view-centric mapping method that dynamically constructs neural fields based on run-time observation.
Our method enables camera pose updates using loop closures and scene boundary updates by representing the scene with multiple neural fields.
The experimental results demonstrate the superior performance of our method over existing world-centric neural field-based SLAM systems.
arXiv Detail & Related papers (2023-03-23T20:22:01Z) - MeshLoc: Mesh-Based Visual Localization [54.731309449883284]
We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage.
Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
arXiv Detail & Related papers (2022-07-21T21:21:10Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - SpinNet: Learning a General Surface Descriptor for 3D Point Cloud
Registration [57.28608414782315]
We introduce a new, yet conceptually simple, neural architecture, termed SpinNet, to extract local features.
Experiments on both indoor and outdoor datasets demonstrate that SpinNet outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-11-24T15:00:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.