NeMO: Neural Map Growing System for Spatiotemporal Fusion in
Bird's-Eye-View and BDD-Map Benchmark
- URL: http://arxiv.org/abs/2306.04540v1
- Date: Wed, 7 Jun 2023 15:46:15 GMT
- Title: NeMO: Neural Map Growing System for Spatiotemporal Fusion in
Bird's-Eye-View and BDD-Map Benchmark
- Authors: Xi Zhu, Xiya Cao, Zhiwei Dong, Caifa Zhou, Qiangbo Liu, Wei Li,
Yongliang Wang
- Abstract summary: Vision-centric Bird's-Eye View representation is essential for autonomous driving systems.
This work outlines a new paradigm, named NeMO, for generating local maps through the utilization of a readable and writable big map.
With an assumption that the feature distribution of all BEV grids follows an identical pattern, we adopt a shared-weight neural network for all grids to update the big map.
- Score: 9.430779563669908
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision-centric Bird's-Eye View (BEV) representation is essential for
autonomous driving systems (ADS). Multi-frame temporal fusion which leverages
historical information has been demonstrated to provide more comprehensive
perception results. While most research focuses on ego-centric maps of fixed
settings, long-range local map generation remains less explored. This work
outlines a new paradigm, named NeMO, for generating local maps through the
utilization of a readable and writable big map, a learning-based fusion module,
and an interaction mechanism between the two. With an assumption that the
feature distribution of all BEV grids follows an identical pattern, we adopt a
shared-weight neural network for all grids to update the big map. This paradigm
supports the fusion of longer time series and the generation of long-range BEV
local maps. Furthermore, we release BDD-Map, a BDD100K-based dataset
incorporating map element annotations, including lane lines, boundaries, and
pedestrian crossing. Experiments on the NuScenes and BDD-Map datasets
demonstrate that NeMO outperforms state-of-the-art map segmentation methods. We
also provide a new scene-level BEV map evaluation setting along with the
corresponding baseline for a more comprehensive comparison.
Related papers
- TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior [70.84644266024571]
We propose to train a perception model to "see" standard definition maps (SDMaps)
We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information.
Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology.
arXiv Detail & Related papers (2024-11-22T06:13:42Z) - VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car.
We propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space.
Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens
arXiv Detail & Related papers (2024-11-03T16:09:47Z) - Enhancing Vectorized Map Perception with Historical Rasterized Maps [37.48510990922406]
We propose HRMapNet, leveraging a low-cost Historical Rasterized Map to enhance online vectorized map perception.
The historicalized map can be easily constructed from past predicted vectorized results and provides valuable complementary information.
HRMapNet can be integrated with most online vectorized map perception methods.
arXiv Detail & Related papers (2024-09-01T05:22:33Z) - Progressive Query Refinement Framework for Bird's-Eye-View Semantic Segmentation from Surrounding Images [3.495246564946556]
We introduce the Multi-Resolution (MR) concept into Bird's-Eye-View (BEV) semantic segmentation for autonomous driving.
We propose a visual feature interaction network that promotes interactions between features across images and across feature levels.
We evaluate our model on a large-scale real-world dataset.
arXiv Detail & Related papers (2024-07-24T05:00:31Z) - Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data [3.1968751101341173]
Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks.
Recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets.
We show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms.
arXiv Detail & Related papers (2024-07-11T17:57:22Z) - MV-Map: Offboard HD-Map Generation with Multi-view Consistency [29.797769409113105]
Bird's-eye-view (BEV) perception models can be useful for building high-definition maps (HD-Maps) with less human labor.
Their results are often unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps from different viewpoints.
This paper advocates a more practical 'offboard' HD-Map generation setup that removes the computation constraints.
arXiv Detail & Related papers (2023-05-15T17:59:15Z) - Neural Map Prior for Autonomous Driving [17.198729798817094]
High-definition (HD) semantic maps are crucial in enabling autonomous vehicles to navigate urban environments.
Traditional method of creating offline HD maps involves labor-intensive manual annotation processes.
Recent studies have proposed an alternative approach that generates local maps using online sensor observations.
In this study, we propose Neural Map Prior (NMP), a neural representation of global maps.
arXiv Detail & Related papers (2023-04-17T17:58:40Z) - BEVBert: Multimodal Map Pre-training for Language-guided Navigation [75.23388288113817]
We propose a new map-based pre-training paradigm that is spatial-aware for use in vision-and-language navigation (VLN)
We build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map.
Based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal.
arXiv Detail & Related papers (2022-12-08T16:27:54Z) - Long-term Visual Map Sparsification with Heterogeneous GNN [47.12309045366042]
In this paper, we aim to overcome the environmental changes and reduce the map size at the same time by selecting points that are valuable to future localization.
Inspired by the recent progress in Graph Neural Network(GNN), we propose the first work that models SfM maps as heterogeneous graphs and predicts 3D point importance scores with a GNN.
Two novel supervisions are proposed: 1) a data-fitting term for selecting valuable points to future localization based on training queries; 2) a K-Cover term for selecting sparse points with full map coverage.
arXiv Detail & Related papers (2022-03-29T01:46:12Z) - HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps [81.86923212296863]
HD maps are maps with precise definitions of road lanes with rich semantics of the traffic rules.
There are only a small amount of real-world road topologies and geometries, which significantly limits our ability to test out the self-driving stack.
We propose HDMapGen, a hierarchical graph generation model capable of producing high-quality and diverse HD maps.
arXiv Detail & Related papers (2021-06-28T17:59:30Z) - Label Decoupling Framework for Salient Object Detection [157.96262922808245]
Recent methods mainly focus on aggregating multi-level features from convolutional network (FCN) and introducing edge information as auxiliary supervision.
We propose a label decoupling framework (LDF) which consists of a label decoupling procedure and a feature interaction network (FIN)
Experiments on six benchmark datasets demonstrate that LDF outperforms state-of-the-art approaches on different evaluation metrics.
arXiv Detail & Related papers (2020-08-25T14:23:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.