NeMO: Neural Map Growing System for Spatiotemporal Fusion in
Bird's-Eye-View and BDD-Map Benchmark
- URL: http://arxiv.org/abs/2306.04540v1
- Date: Wed, 7 Jun 2023 15:46:15 GMT
- Title: NeMO: Neural Map Growing System for Spatiotemporal Fusion in
Bird's-Eye-View and BDD-Map Benchmark
- Authors: Xi Zhu, Xiya Cao, Zhiwei Dong, Caifa Zhou, Qiangbo Liu, Wei Li,
Yongliang Wang
- Abstract summary: Vision-centric Bird's-Eye View representation is essential for autonomous driving systems.
This work outlines a new paradigm, named NeMO, for generating local maps through the utilization of a readable and writable big map.
With an assumption that the feature distribution of all BEV grids follows an identical pattern, we adopt a shared-weight neural network for all grids to update the big map.
- Score: 9.430779563669908
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision-centric Bird's-Eye View (BEV) representation is essential for
autonomous driving systems (ADS). Multi-frame temporal fusion which leverages
historical information has been demonstrated to provide more comprehensive
perception results. While most research focuses on ego-centric maps of fixed
settings, long-range local map generation remains less explored. This work
outlines a new paradigm, named NeMO, for generating local maps through the
utilization of a readable and writable big map, a learning-based fusion module,
and an interaction mechanism between the two. With an assumption that the
feature distribution of all BEV grids follows an identical pattern, we adopt a
shared-weight neural network for all grids to update the big map. This paradigm
supports the fusion of longer time series and the generation of long-range BEV
local maps. Furthermore, we release BDD-Map, a BDD100K-based dataset
incorporating map element annotations, including lane lines, boundaries, and
pedestrian crossing. Experiments on the NuScenes and BDD-Map datasets
demonstrate that NeMO outperforms state-of-the-art map segmentation methods. We
also provide a new scene-level BEV map evaluation setting along with the
corresponding baseline for a more comprehensive comparison.
Related papers
- Progressive Query Refinement Framework for Bird's-Eye-View Semantic Segmentation from Surrounding Images [3.495246564946556]
We introduce the Multi-Resolution (MR) concept into Bird's-Eye-View (BEV) semantic segmentation for autonomous driving.
We propose a visual feature interaction network that promotes interactions between features across images and across feature levels.
We evaluate our model on a large-scale real-world dataset.
arXiv Detail & Related papers (2024-07-24T05:00:31Z) - Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data [3.1968751101341173]
Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks.
Recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets.
We show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms.
arXiv Detail & Related papers (2024-07-11T17:57:22Z) - MV-Map: Offboard HD-Map Generation with Multi-view Consistency [29.797769409113105]
Bird's-eye-view (BEV) perception models can be useful for building high-definition maps (HD-Maps) with less human labor.
Their results are often unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps from different viewpoints.
This paper advocates a more practical 'offboard' HD-Map generation setup that removes the computation constraints.
arXiv Detail & Related papers (2023-05-15T17:59:15Z) - Neural Map Prior for Autonomous Driving [17.198729798817094]
High-definition (HD) semantic maps are crucial in enabling autonomous vehicles to navigate urban environments.
Traditional method of creating offline HD maps involves labor-intensive manual annotation processes.
Recent studies have proposed an alternative approach that generates local maps using online sensor observations.
In this study, we propose Neural Map Prior (NMP), a neural representation of global maps.
arXiv Detail & Related papers (2023-04-17T17:58:40Z) - BEVBert: Multimodal Map Pre-training for Language-guided Navigation [75.23388288113817]
We propose a new map-based pre-training paradigm that is spatial-aware for use in vision-and-language navigation (VLN)
We build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map.
Based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal.
arXiv Detail & Related papers (2022-12-08T16:27:54Z) - Long-term Visual Map Sparsification with Heterogeneous GNN [47.12309045366042]
In this paper, we aim to overcome the environmental changes and reduce the map size at the same time by selecting points that are valuable to future localization.
Inspired by the recent progress in Graph Neural Network(GNN), we propose the first work that models SfM maps as heterogeneous graphs and predicts 3D point importance scores with a GNN.
Two novel supervisions are proposed: 1) a data-fitting term for selecting valuable points to future localization based on training queries; 2) a K-Cover term for selecting sparse points with full map coverage.
arXiv Detail & Related papers (2022-03-29T01:46:12Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - HDMapNet: An Online HD Map Construction and Evaluation Framework [23.19001503634617]
HD map construction is a crucial problem for autonomous driving.
Traditional HD maps are coupled with centimeter-level accurate localization which is unreliable in many scenarios.
Online map learning is a more scalable way to provide semantic and geometry priors to self-driving vehicles.
arXiv Detail & Related papers (2021-07-13T18:06:46Z) - HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps [81.86923212296863]
HD maps are maps with precise definitions of road lanes with rich semantics of the traffic rules.
There are only a small amount of real-world road topologies and geometries, which significantly limits our ability to test out the self-driving stack.
We propose HDMapGen, a hierarchical graph generation model capable of producing high-quality and diverse HD maps.
arXiv Detail & Related papers (2021-06-28T17:59:30Z) - Label Decoupling Framework for Salient Object Detection [157.96262922808245]
Recent methods mainly focus on aggregating multi-level features from convolutional network (FCN) and introducing edge information as auxiliary supervision.
We propose a label decoupling framework (LDF) which consists of a label decoupling procedure and a feature interaction network (FIN)
Experiments on six benchmark datasets demonstrate that LDF outperforms state-of-the-art approaches on different evaluation metrics.
arXiv Detail & Related papers (2020-08-25T14:23:38Z) - Rethinking Localization Map: Towards Accurate Object Perception with
Self-Enhancement Maps [78.2581910688094]
This work introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision.
In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC.
arXiv Detail & Related papers (2020-06-09T12:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.