Visual Localization using Imperfect 3D Models from the Internet
- URL: http://arxiv.org/abs/2304.05947v1
- Date: Wed, 12 Apr 2023 16:15:05 GMT
- Title: Visual Localization using Imperfect 3D Models from the Internet
- Authors: Vojtech Panek, Zuzana Kukelova, Torsten Sattler
- Abstract summary: This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
- Score: 54.731309449883284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual localization is a core component in many applications, including
augmented reality (AR). Localization algorithms compute the camera pose of a
query image w.r.t. a scene representation, which is typically built from
images. This often requires capturing and storing large amounts of data,
followed by running Structure-from-Motion (SfM) algorithms. An interesting, and
underexplored, source of data for building scene representations are 3D models
that are readily available on the Internet, e.g., hand-drawn CAD models, 3D
models generated from building footprints, or from aerial images. These models
allow to perform visual localization right away without the time-consuming
scene capturing and model building steps. Yet, it also comes with challenges as
the available 3D models are often imperfect reflections of reality. E.g., the
models might only have generic or no textures at all, might only provide a
simple approximation of the scene geometry, or might be stretched. This paper
studies how the imperfections of these models affect localization accuracy. We
create a new benchmark for this task and provide a detailed experimental
evaluation based on multiple 3D models per scene. We show that 3D models from
the Internet show promise as an easy-to-obtain scene representation. At the
same time, there is significant room for improvement for visual localization
pipelines. To foster research on this interesting and challenging task, we
release our benchmark at v-pnk.github.io/cadloc.
Related papers
- CAT3D: Create Anything in 3D with Multi-View Diffusion Models [87.80820708758317]
We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model.
CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.
arXiv Detail & Related papers (2024-05-16T17:59:05Z) - Probing the 3D Awareness of Visual Foundation Models [56.68380136809413]
We analyze the 3D awareness of visual foundation models.
We conduct experiments using task-specific probes and zero-shot inference procedures on frozen features.
arXiv Detail & Related papers (2024-04-12T17:58:04Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - Model2Scene: Learning 3D Scene Representation via Contrastive
Language-CAD Models Pre-training [105.3421541518582]
Current successful methods of 3D scene perception rely on the large-scale annotated point cloud.
We propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages.
Model2Scene yields impressive label-free 3D object salient detection with an average mAP of 46.08% and 55.49% on the ScanNet and S3DIS datasets, respectively.
arXiv Detail & Related papers (2023-09-29T03:51:26Z) - SACReg: Scene-Agnostic Coordinate Regression for Visual Localization [16.866303169903237]
We propose a generalized SCR model trained once in new test scenes, regardless of their scale, without any finetuning.
Instead of encoding the scene coordinates into the network weights, our model takes as input a database image with some sparse 2D pixel to 3D coordinate annotations.
We show that the database representation of images and their 2D-3D annotations can be highly compressed with negligible loss of localization performance.
arXiv Detail & Related papers (2023-07-21T16:56:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.