RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale
Indoor Environments
- URL: http://arxiv.org/abs/2207.12579v1
- Date: Tue, 26 Jul 2022 00:08:43 GMT
- Title: RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale
Indoor Environments
- Authors: Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui,
Zilong Dong, Siyu Zhu, and Ping Tan
- Abstract summary: We propose a virtual view synthesis-based approach, RenderNet, to enrich the database and refine poses regarding this particular scenario.
The proposed method can largely improve the performance in large-scale indoor environments, achieving an improvement of 7.1% and 12.2% on the Inloc dataset.
- Score: 36.91498676137178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual relocalization has been a widely discussed problem in 3D vision: given
a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query
image is estimated. Relocalization in large-scale indoor environments enables
attractive applications such as augmented reality and robot navigation.
However, appearance changes fast in such environments when the camera moves,
which is challenging for the relocalization system. To address this problem, we
propose a virtual view synthesis-based approach, RenderNet, to enrich the
database and refine poses regarding this particular scenario. Instead of
rendering real images which requires high-quality 3D models, we opt to directly
render the needed global and local features of virtual viewpoints and apply
them in the subsequent image retrieval and feature matching operations
respectively. The proposed method can largely improve the performance in
large-scale indoor environments, e.g., achieving an improvement of 7.1\% and
12.2\% on the Inloc dataset.
Related papers
- SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters.
Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z) - Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations [8.522160106746478]
We present a global visual localization system capable of localizing a single camera image across various 3D map representations.
Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs.
NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%.
arXiv Detail & Related papers (2024-08-21T19:37:17Z) - Lazy Visual Localization via Motion Averaging [89.8709956317671]
We show that it is possible to achieve high localization accuracy without reconstructing the scene from the database.
Experiments show that our visual localization proposal, LazyLoc, achieves comparable performance against state-of-the-art structure-based methods.
arXiv Detail & Related papers (2023-07-19T13:40:45Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - MeshLoc: Mesh-Based Visual Localization [54.731309449883284]
We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage.
Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
arXiv Detail & Related papers (2022-07-21T21:21:10Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation [21.111919718001907]
Our proposed 3D scene representation, Nerfels, is locally dense yet globally sparse.
We adopt a feature-driven approach for representing scene-agnostic, local 3D patches with renderable codes.
Our model can be incorporated to existing state-of-the-art hand-crafted and learned local feature estimators, yielding improved performance when evaluating on ScanNet for wide camera baseline scenarios.
arXiv Detail & Related papers (2022-06-04T06:29:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.