OrienterNet: Visual Localization in 2D Public Maps with Neural Matching
- URL: http://arxiv.org/abs/2304.02009v1
- Date: Tue, 4 Apr 2023 17:59:03 GMT
- Title: OrienterNet: Visual Localization in 2D Public Maps with Neural Matching
- Authors: Paul-Edouard Sarlin, Daniel DeTone, Tsun-Yi Yang, Armen Avetisyan,
Julian Straub, Tomasz Malisiewicz, Samuel Rota Bulo, Richard Newcombe, Peter
Kontschieder, Vasileios Balntas
- Abstract summary: OrienterNet is the first deep neural network that can localize an image with sub-meter accuracy using the same 2D semantic maps that humans use.
OrienterNet estimates the location and orientation of a query image by matching a neural Bird's-Eye View with open and globally available maps from OpenStreetMap.
To enable this, we introduce a large crowd-sourced dataset of images captured across 12 cities from the diverse viewpoints of cars, bikes, and pedestrians.
- Score: 21.673020132276573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans can orient themselves in their 3D environments using simple 2D maps.
Differently, algorithms for visual localization mostly rely on complex 3D point
clouds that are expensive to build, store, and maintain over time. We bridge
this gap by introducing OrienterNet, the first deep neural network that can
localize an image with sub-meter accuracy using the same 2D semantic maps that
humans use. OrienterNet estimates the location and orientation of a query image
by matching a neural Bird's-Eye View with open and globally available maps from
OpenStreetMap, enabling anyone to localize anywhere such maps are available.
OrienterNet is supervised only by camera poses but learns to perform semantic
matching with a wide range of map elements in an end-to-end manner. To enable
this, we introduce a large crowd-sourced dataset of images captured across 12
cities from the diverse viewpoints of cars, bikes, and pedestrians. OrienterNet
generalizes to new datasets and pushes the state of the art in both robotics
and AR scenarios. The code and trained model will be released publicly.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic
Understanding [57.108301842535894]
We introduce SNAP, a deep network that learns rich neural 2D maps from ground-level and overhead images.
We train our model to align neural maps estimated from different inputs, supervised only with camera poses over tens of millions of StreetView images.
SNAP can resolve the location of challenging image queries beyond the reach of traditional methods.
arXiv Detail & Related papers (2023-06-08T17:54:47Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - 3DoF Localization from a Single Image and an Object Map: the Flatlandia
Problem and Dataset [20.986848597435728]
We propose Flatlandia, a novel visual localization challenge.
We investigate whether it is possible to localize a visual query by comparing the layout of its common objects detected against the known spatial layout of objects in the map.
For each, we propose initial baseline models and compare them against state-of-the-art 6DoF and 3DoF methods.
arXiv Detail & Related papers (2023-04-13T09:53:09Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - Bidirectional Projection Network for Cross Dimension Scene Understanding [69.29443390126805]
We present a emphbidirectional projection network (BPNet) for joint 2D and 3D reasoning in an end-to-end manner.
Via the emphBPM, complementary 2D and 3D information can interact with each other in multiple architectural levels.
Our emphBPNet achieves top performance on the ScanNetV2 benchmark for both 2D and 3D semantic segmentation.
arXiv Detail & Related papers (2021-03-26T08:31:39Z) - Crowdsourced 3D Mapping: A Combined Multi-View Geometry and
Self-Supervised Learning Approach [10.610403488989428]
We propose a framework that estimates the 3D positions of semantically meaningful landmarks without assuming known camera intrinsics.
We utilize multi-view geometry as well as deep learning based self-calibration, depth, and ego-motion estimation for traffic sign positioning.
We achieve an average single-journey relative and absolute positioning accuracy of 39cm and 1.26m respectively.
arXiv Detail & Related papers (2020-07-25T12:10:16Z) - 3D Crowd Counting via Geometric Attention-guided Multi-View Fusion [50.520192402702015]
We propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps.
Compared to 2D fusion, the 3D fusion extracts more information of the people along the z-dimension (height), which helps to address the scale variations across multiple views.
The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density.
arXiv Detail & Related papers (2020-03-18T11:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.