Related papers: Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

URL: http://arxiv.org/abs/2407.08726v1
Date: Thu, 11 Jul 2024 17:57:22 GMT
Title: Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data
Authors: Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer,
Abstract summary: Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. Recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. We show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms.
Score: 3.1968751101341173
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a dataset of 1.2 million pairs of FPV images & BEV maps encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation.

Related papers

Unified Human Localization and Trajectory Prediction with Monocular Vision [64.19384064365431]
MonoTransmotion is a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks. We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs.
arXiv Detail & Related papers (2025-03-05T14:18:39Z)
TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior [70.84644266024571]
We propose to train a perception model to "see" standard definition maps (SDMaps) We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information. Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology.
arXiv Detail & Related papers (2024-11-22T06:13:42Z)
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car. We propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space. Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens
arXiv Detail & Related papers (2024-11-03T16:09:47Z)
Enhancing Vectorized Map Perception with Historical Rasterized Maps [37.48510990922406]
We propose HRMapNet, leveraging a low-cost Historical Rasterized Map to enhance online vectorized map perception. The historicalized map can be easily constructed from past predicted vectorized results and provides valuable complementary information. HRMapNet can be integrated with most online vectorized map perception methods.
arXiv Detail & Related papers (2024-09-01T05:22:33Z)
Progressive Query Refinement Framework for Bird's-Eye-View Semantic Segmentation from Surrounding Images [3.495246564946556]
We introduce the Multi-Resolution (MR) concept into Bird's-Eye-View (BEV) semantic segmentation for autonomous driving. We propose a visual feature interaction network that promotes interactions between features across images and across feature levels. We evaluate our model on a large-scale real-world dataset.
arXiv Detail & Related papers (2024-07-24T05:00:31Z)
Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention [30.190497345299004]
We propose exposing the rich internal features of online map estimation methods and show how they enable more tightly integrating online mapping with trajectory forecasting. In doing so, we find that directly accessing internal BEV features yields up to 73% faster inference speeds and up to 29% more accurate predictions on the real-world nuScenes dataset.
arXiv Detail & Related papers (2024-07-09T08:59:27Z)
Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps [13.524499163234342]
We propose a new model capable of performing zero-shot projections of any modality available in a first person view to the corresponding BEV map. We experimentally show that the model outperforms competing methods, in particular the widely used baseline resorting to monocular depth estimation.
arXiv Detail & Related papers (2024-02-21T14:50:24Z)
U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z)
BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [56.77287041917277]
3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
arXiv Detail & Related papers (2023-09-05T12:42:26Z)
NeMO: Neural Map Growing System for Spatiotemporal Fusion in Bird's-Eye-View and BDD-Map Benchmark [9.430779563669908]
Vision-centric Bird's-Eye View representation is essential for autonomous driving systems. This work outlines a new paradigm, named NeMO, for generating local maps through the utilization of a readable and writable big map. With an assumption that the feature distribution of all BEV grids follows an identical pattern, we adopt a shared-weight neural network for all grids to update the big map.
arXiv Detail & Related papers (2023-06-07T15:46:15Z)
BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images [13.258689143949912]
We propose an end-to-end visual semantic localization neural network using multi-view camera images. The BEV-Locator is capable to estimate the vehicle poses under versatile scenarios. Experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$circ$ in lateral, longitudinal translation and heading angle degree.
arXiv Detail & Related papers (2022-11-27T20:24:56Z)
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view. Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z)
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems. We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z)
BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object Detection for Autonomous Driving [2.9769485817170387]
CNNs can leverage the global context in the scene to project better. We create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes. We observe a significant improvement of 13% in mIoU using the simple baseline implementation.
arXiv Detail & Related papers (2021-07-11T01:11:58Z)
OpenREALM: Real-time Mapping for Unmanned Aerial Vehicles [62.997667081978825]
OpenREALM is a real-time mapping framework for Unmanned Aerial Vehicles (UAVs) Different modes of operation allow OpenREALM to perform simple stitching assuming an approximate plane ground. In all modes incremental progress of the resulting map can be viewed live by an operator on the ground.
arXiv Detail & Related papers (2020-09-22T12:28:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.