Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations
- URL: http://arxiv.org/abs/2408.11966v2
- Date: Sat, 19 Oct 2024 09:50:00 GMT
- Title: Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations
- Authors: Lintong Zhang, Yifu Tao, Jiarong Lin, Fu Zhang, Maurice Fallon,
- Abstract summary: We present a global visual localization system capable of localizing a single camera image across various 3D map representations.
Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs.
NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%.
- Score: 8.522160106746478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in mapping techniques have enabled the creation of highly accurate dense 3D maps during robotic missions, such as point clouds, meshes, or NeRF-based representations. These developments present new opportunities for reusing these maps for localization. However, there remains a lack of a unified approach that can operate seamlessly across different map representations. This paper presents and evaluates a global visual localization system capable of localizing a single camera image across various 3D map representations built using both visual and lidar sensing. Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs. Leveraging the precise 3D geometric map, our method automatically defines rendering poses, reducing the number of database images while preserving retrieval performance. To bridge the domain gap between real query camera images and synthetic database images, our approach utilizes learning-based descriptors and feature detectors. We evaluate the system's performance through extensive real-world experiments conducted in both indoor and outdoor settings, assessing the effectiveness of each map representation and demonstrating its advantages over traditional structure-from-motion (SfM) localization approaches. The results show that all three map representations can achieve consistent localization success rates of 55% and higher across various environments. NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%. Furthermore, we demonstrate an advantage over SfM-based approaches that our synthesized database enables localization in the reverse travel direction which is unseen during the mapping process. Our system, operating in real-time on a mobile laptop equipped with a GPU, achieves a processing rate of 1Hz.
Related papers
- LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation [5.739362282280063]
LiteVLoc is a visual localization framework that uses a lightweight topo-metric map to represent the environment.
It reduces storage overhead by leveraging learning-based feature matching and geometric solvers for metric pose estimation.
arXiv Detail & Related papers (2024-10-06T09:26:07Z) - MeshVPR: Citywide Visual Place Recognition Using 3D Meshes [18.168206222895282]
Mesh-based scene representation offers a promising direction for simplifying large-scale hierarchical visual localization pipelines.
While existing work demonstrates the viability of meshes for visual localization, the impact of using synthetic databases rendered from them in visual place recognition remains largely unexplored.
We propose MeshVPR, a novel VPR pipeline that utilizes a lightweight features alignment framework to bridge the gap between real-world and synthetic domains.
arXiv Detail & Related papers (2024-06-04T20:45:53Z) - RGBD GS-ICP SLAM [1.3108652488669732]
We propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS)
Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS.
arXiv Detail & Related papers (2024-03-19T08:49:48Z) - 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization [13.868258945395326]
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting.
Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment.
arXiv Detail & Related papers (2024-03-17T23:06:12Z) - Lazy Visual Localization via Motion Averaging [89.8709956317671]
We show that it is possible to achieve high localization accuracy without reconstructing the scene from the database.
Experiments show that our visual localization proposal, LazyLoc, achieves comparable performance against state-of-the-art structure-based methods.
arXiv Detail & Related papers (2023-07-19T13:40:45Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - 3D Surfel Map-Aided Visual Relocalization with Learned Descriptors [15.608529165143718]
We introduce a method for visual relocalization using the geometric information from a 3D surfel map.
A visual database is first built by global indices from the 3D surfel map rendering, which provides associations between image points and 3D surfels.
A hierarchical camera relocalization algorithm then utilizes the visual database to estimate 6-DoF camera poses.
arXiv Detail & Related papers (2021-04-08T15:59:57Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.