MeshLoc: Mesh-Based Visual Localization
- URL: http://arxiv.org/abs/2207.10762v2
- Date: Mon, 25 Jul 2022 08:08:59 GMT
- Title: MeshLoc: Mesh-Based Visual Localization
- Authors: Vojtech Panek, Zuzana Kukelova and Torsten Sattler
- Abstract summary: We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage.
Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
- Score: 54.731309449883284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual localization, i.e., the problem of camera pose estimation, is a
central component of applications such as autonomous robots and augmented
reality systems. A dominant approach in the literature, shown to scale to large
scenes and to handle complex illumination and seasonal changes, is based on
local features extracted from images. The scene representation is a sparse
Structure-from-Motion point cloud that is tied to a specific local feature.
Switching to another feature type requires an expensive feature matching step
between the database images used to construct the point cloud. In this work, we
thus explore a more flexible alternative based on dense 3D meshes that does not
require features matching between database images to build the scene
representation. We show that this approach can achieve state-of-the-art
results. We further show that surprisingly competitive results can be obtained
when extracting features on renderings of these meshes, without any neural
rendering stage, and even when rendering raw scene geometry without color or
texture. Our results show that dense 3D model-based representations are a
promising alternative to existing representations and point to interesting and
challenging directions for future research.
Related papers
- Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement [32.335953514942474]
This paper proposes to jointly learn the scene representation along with a 3D dense feature field and a 2D feature extractor.
We learn the underlying geometry of the scene with an implicit field through volumetric rendering and design our feature field to leverage intermediate geometric information encoded in the implicit field.
Visual localization is then achieved by aligning the image-based features and the rendered volumetric features.
arXiv Detail & Related papers (2024-06-12T17:51:53Z) - MeshVPR: Citywide Visual Place Recognition Using 3D Meshes [18.168206222895282]
Mesh-based scene representation offers a promising direction for simplifying large-scale hierarchical visual localization pipelines.
While existing work demonstrates the viability of meshes for visual localization, the impact of using synthetic databases rendered from them in visual place recognition remains largely unexplored.
We propose MeshVPR, a novel VPR pipeline that utilizes a lightweight features alignment framework to bridge the gap between real-world and synthetic domains.
arXiv Detail & Related papers (2024-06-04T20:45:53Z) - Lazy Visual Localization via Motion Averaging [89.8709956317671]
We show that it is possible to achieve high localization accuracy without reconstructing the scene from the database.
Experiments show that our visual localization proposal, LazyLoc, achieves comparable performance against state-of-the-art structure-based methods.
arXiv Detail & Related papers (2023-07-19T13:40:45Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - HSCNet++: Hierarchical Scene Coordinate Classification and Regression
for Visual Localization with Transformer [23.920690073252636]
We present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image.
The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments.
arXiv Detail & Related papers (2023-05-05T15:00:14Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - TopNet: Transformer-based Object Placement Network for Image Compositing [43.14411954867784]
Local clues in background images are important to determine the compatibility of placing objects with certain locations/scales.
We propose to learn the correlation between object features and all local background features with a transformer module.
Our new formulation generates a 3D heatmap indicating the plausibility of all location/scale combinations in one network forward pass.
arXiv Detail & Related papers (2023-04-06T20:58:49Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.