Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images
- URL: http://arxiv.org/abs/2301.04224v2
- Date: Sun, 9 Apr 2023 21:30:05 GMT
- Title: Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images
- Authors: Xindi Wu, KwunFung Lau, Francesco Ferroni, Aljo\v{s}a O\v{s}ep, Deva
Ramanan
- Abstract summary: We introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images.
This problem can be posed as cross-modal retrieval by learning a joint, cross-modal embedding space for images and existing maps.
We show that our retrieved maps can be used to update or expand existing maps and even show proof-of-concept results.
- Score: 42.05213970259352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-driving vehicles rely on urban street maps for autonomous navigation. In
this paper, we introduce Pix2Map, a method for inferring urban street map
topology directly from ego-view images, as needed to continually update and
expand existing maps. This is a challenging task, as we need to infer a complex
urban road topology directly from raw image data. The main insight of this
paper is that this problem can be posed as cross-modal retrieval by learning a
joint, cross-modal embedding space for images and existing maps, represented as
discrete graphs that encode the topological layout of the visual surroundings.
We conduct our experimental evaluation using the Argoverse dataset and show
that it is indeed possible to accurately retrieve street maps corresponding to
both seen and unseen roads solely from image data. Moreover, we show that our
retrieved maps can be used to update or expand existing maps and even show
proof-of-concept results for visual localization and image retrieval from
spatial graphs.
Related papers
- CartoMark: a benchmark dataset for map pattern recognition and 1 map
content retrieval with machine intelligence [9.652629004863364]
We develop a large-scale benchmark dataset for map text annotation recognition, map scene classification, map super-resolution reconstruction, and map style transferring.
These well-labelled datasets would facilitate the state-of-the-art machine intelligence technologies to conduct map feature detection, map pattern recognition and map content retrieval.
arXiv Detail & Related papers (2023-12-14T01:54:38Z) - SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic
Understanding [57.108301842535894]
We introduce SNAP, a deep network that learns rich neural 2D maps from ground-level and overhead images.
We train our model to align neural maps estimated from different inputs, supervised only with camera poses over tens of millions of StreetView images.
SNAP can resolve the location of challenging image queries beyond the reach of traditional methods.
arXiv Detail & Related papers (2023-06-08T17:54:47Z) - Dataset of Pathloss and ToA Radio Maps With Localization Application [64.57766973771004]
The datasets include simulated pathloss/received signal strength ( RSS) and time of arrival ( ToA) radio maps over a large collection of realistic dense urban setting in real city maps.
The two main applications of the presented dataset are 1) learning methods that predict the pathloss from input city maps, and, 2) wireless localization.
The fact that the RSS and ToA maps are computed by the same simulations over the same city maps allows for a fair comparison of the RSS and ToA-based localization methods.
arXiv Detail & Related papers (2022-11-18T20:39:51Z) - A Survey on Visual Map Localization Using LiDARs and Cameras [0.0]
We define visual map localization as a two-stage process.
At the stage of place recognition, the initial position of the vehicle in the map is determined by comparing the visual sensor output with a set of geo-tagged map regions of interest.
At the stage of map metric localization, the vehicle is tracked while it moves across the map by continuously aligning the visual sensors' output with the current area of the map that is being traversed.
arXiv Detail & Related papers (2022-08-05T20:11:18Z) - VectorMapNet: End-to-end Vectorized HD Map Learning [18.451587680552464]
We introduce an end-to-end vectorized HD map learning pipeline, termed VectorMapNet.
This pipeline can explicitly model the spatial relation between map elements and generate vectorized maps friendly to downstream autonomous driving tasks.
Experiments show that VectorMapNet achieve strong map learning performance on both nuScenes and Argo2 dataset.
arXiv Detail & Related papers (2022-06-17T17:57:13Z) - csBoundary: City-scale Road-boundary Detection in Aerial Images for
High-definition Maps [10.082536828708779]
We propose csBoundary to automatically detect road boundaries at the city scale for HD map annotation.
Our network takes as input an aerial image patch, and directly infers the continuous road-boundary graph from this image.
Our csBoundary is evaluated and compared on a public benchmark dataset.
arXiv Detail & Related papers (2021-11-11T02:04:36Z) - Semantic Image Alignment for Vehicle Localization [111.59616433224662]
We present a novel approach to vehicle localization in dense semantic maps using semantic segmentation from a monocular camera.
In contrast to existing visual localization approaches, the system does not require additional keypoint features, handcrafted localization landmark extractors or expensive LiDAR sensors.
arXiv Detail & Related papers (2021-10-08T14:40:15Z) - DAGMapper: Learning to Map by Discovering Lane Topology [84.12949740822117]
We focus on drawing the lane boundaries of complex highways with many lanes that contain topology changes due to forks and merges.
We formulate the problem as inference in a directed acyclic graphical model (DAG), where the nodes of the graph encode geometric and topological properties of the local regions of the lane boundaries.
We show the effectiveness of our approach on two major North American Highways in two different states and show high precision and recall as well as 89% correct topology.
arXiv Detail & Related papers (2020-12-22T21:58:57Z) - Learning Lane Graph Representations for Motion Forecasting [92.88572392790623]
We construct a lane graph from raw map data to preserve the map structure.
We exploit a fusion network consisting of four types of interactions, actor-to-lane, lane-to-lane, lane-to-actor and actor-to-actor.
Our approach significantly outperforms the state-of-the-art on the large scale Argoverse motion forecasting benchmark.
arXiv Detail & Related papers (2020-07-27T17:59:49Z) - Predicting Semantic Map Representations from Images using Pyramid
Occupancy Networks [27.86228863466213]
We present a simple, unified approach for estimating maps directly from monocular images using a single end-to-end deep learning architecture.
We demonstrate the effectiveness of our approach by evaluating against several challenging baselines on the NuScenes and Argoverse datasets.
arXiv Detail & Related papers (2020-03-30T12:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.