3DoF Localization from a Single Image and an Object Map: the Flatlandia
Problem and Dataset
- URL: http://arxiv.org/abs/2304.06373v4
- Date: Wed, 8 Nov 2023 14:43:09 GMT
- Title: 3DoF Localization from a Single Image and an Object Map: the Flatlandia
Problem and Dataset
- Authors: Matteo Toso, Matteo Taiana, Stuart James and Alessio Del Bue
- Abstract summary: We propose Flatlandia, a novel visual localization challenge.
We investigate whether it is possible to localize a visual query by comparing the layout of its common objects detected against the known spatial layout of objects in the map.
For each, we propose initial baseline models and compare them against state-of-the-art 6DoF and 3DoF methods.
- Score: 20.986848597435728
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Efficient visual localization is crucial to many applications, such as
large-scale deployment of autonomous agents and augmented reality. Traditional
visual localization, while achieving remarkable accuracy, relies on extensive
3D models of the scene or large collections of geolocalized images, which are
often inefficient to store and to scale to novel environments. In contrast,
humans orient themselves using very abstract 2D maps, using the location of
clearly identifiable landmarks. Drawing on this and on the success of recent
works that explored localization on 2D abstract maps, we propose Flatlandia, a
novel visual localization challenge. With Flatlandia, we investigate whether it
is possible to localize a visual query by comparing the layout of its common
objects detected against the known spatial layout of objects in the map. We
formalize the challenge as two tasks at different levels of accuracy to
investigate the problem and its possible limitations; for each, we propose
initial baseline models and compare them against state-of-the-art 6DoF and 3DoF
methods. Code and dataset are publicly available at
github.com/IIT-PAVIS/Flatlandia.
Related papers
- Multiview Scene Graph [7.460438046915524]
A proper scene representation is central to the pursuit of spatial intelligence.
We propose to build Multiview Scene Graphs (MSG) from unposed images.
MSG represents a scene topologically with interconnected place and object nodes.
arXiv Detail & Related papers (2024-10-15T02:04:05Z) - SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters.
Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z) - Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations [8.522160106746478]
We present a global visual localization system capable of localizing a single camera image across various 3D map representations.
Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs.
NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%.
arXiv Detail & Related papers (2024-08-21T19:37:17Z) - MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.
The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z) - GLACE: Global Local Accelerated Coordinate Encoding [66.87005863868181]
Scene coordinate regression methods are effective in small-scale scenes but face significant challenges in large-scale scenes.
We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network.
Our method achieves state-of-the-art results on large-scale scenes with a low-map-size model.
arXiv Detail & Related papers (2024-06-06T17:59:50Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - Image-based Geolocalization by Ground-to-2.5D Map Matching [21.21416396311102]
Methods often utilize cross-view localization techniques to match ground-view query images with 2D maps.
We propose a new approach to learning representative embeddings from multi-modal data.
By encoding crucial geometric cues, our method learns discriminative location embeddings for matching panoramic images and maps.
arXiv Detail & Related papers (2023-08-11T08:00:30Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization [44.97567243883994]
We propose a new benchmark for visual localization in outdoor scenes using crowd-sourced data.
We show that our dataset is very challenging, with all evaluated methods failing on its hardest parts.
As part of the dataset release, we provide the tooling used to generate it, enabling efficient and effective 2D correspondence annotation.
arXiv Detail & Related papers (2021-09-09T19:25:48Z) - Learning Cross-Scale Visual Representations for Real-Time Image
Geo-Localization [21.375640354558044]
State estimation approaches based on local sensors are drifting-prone for long-range missions as error accumulates.
We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources.
We propose a framework that learns cross-scale visual representations without supervision.
arXiv Detail & Related papers (2021-09-09T08:08:54Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.