Related papers: Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention

Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention

URL: http://arxiv.org/abs/2407.06683v1
Date: Tue, 9 Jul 2024 08:59:27 GMT
Title: Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
Authors: Xunjiang Gu, Guanyu Song, Igor Gilitschenski, Marco Pavone, Boris Ivanovic,
Abstract summary: We propose exposing the rich internal features of online map estimation methods and show how they enable more tightly integrating online mapping with trajectory forecasting. In doing so, we find that directly accessing internal BEV features yields up to 73% faster inference speeds and up to 29% more accurate predictions on the real-world nuScenes dataset.
Score: 30.190497345299004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding road geometry is a critical component of the autonomous vehicle (AV) stack. While high-definition (HD) maps can readily provide such information, they suffer from high labeling and maintenance costs. Accordingly, many recent works have proposed methods for estimating HD maps online from sensor data. The vast majority of recent approaches encode multi-camera observations into an intermediate representation, e.g., a bird's eye view (BEV) grid, and produce vector map elements via a decoder. While this architecture is performant, it decimates much of the information encoded in the intermediate representation, preventing downstream tasks (e.g., behavior prediction) from leveraging them. In this work, we propose exposing the rich internal features of online map estimation methods and show how they enable more tightly integrating online mapping with trajectory forecasting. In doing so, we find that directly accessing internal BEV features yields up to 73% faster inference speeds and up to 29% more accurate predictions on the real-world nuScenes dataset.

Related papers

AugMapNet: Improving Spatial Latent Structure via BEV Grid Augmentation for Enhanced Vectorized Online HD Map Construction [10.651014925267859]
AugMapNet is a novel technique that significantly enhances the latent BEV representation. Experiments on nuScenes and Argoverse2 datasets demonstrate significant improvements in vectorized map prediction performance. A detailed analysis of the latent BEV grid confirms a more structured latent space of AugMapNet.
arXiv Detail & Related papers (2025-03-17T17:55:32Z)
TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior [70.84644266024571]
We propose to train a perception model to "see" standard definition maps (SDMaps) We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information. Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology.
arXiv Detail & Related papers (2024-11-22T06:13:42Z)
SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations [3.8472678261304587]
We propose a modular pipeline for vector map generation with improved generalization to sensor configurations. By adopting a BEV semantic map robust to different sensor configurations, our proposed approach significantly improves the generalization performance.
arXiv Detail & Related papers (2024-04-30T23:45:16Z)
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction [30.190497345299004]
We extend state-of-the-art online map estimation methods to additionally estimate uncertainty. In doing so, we find that incorporating uncertainty yields up to 50% faster training convergence and up to 15% better prediction performance.
arXiv Detail & Related papers (2024-03-25T05:58:33Z)
Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps [51.24861159115138]
Standard Definition (SD) maps are more affordable and have worldwide coverage, offering a scalable alternative. We propose a novel framework to integrate SD maps into online map prediction and propose a Transformer-based encoder, SD Map Representations from transFormers. This enhancement consistently and significantly boosts (by up to 60%) lane detection and topology prediction on current state-of-the-art online map prediction methods.
arXiv Detail & Related papers (2023-11-07T15:42:22Z)
StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction [36.1596833523566]
We present StreamMapNet, a novel online mapping pipeline adept at long-sequence temporal modeling of videos. StreamMapNet employs multi-point attention and temporal information which empowers the construction of large-range local HD maps with high stability.
arXiv Detail & Related papers (2023-08-24T05:22:43Z)
Prior Based Online Lane Graph Extraction from Single Onboard Camera Image [133.68032636906133]
We tackle online estimation of the lane graph from a single onboard camera image. The prior is extracted from the dataset through a transformer based Wasserstein Autoencoder. The autoencoder is then used to enhance the initial lane graph estimates.
arXiv Detail & Related papers (2023-07-25T08:58:26Z)
InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning [6.062751776009753]
We propose online HD map learning framework that detects HD map elements from onboard sensor observations. InstaGraM, instance-level graph modeling of HD map brings accurate and fast end-to-end vectorized HD map learning. Our proposed network outperforms previous models by up to 13.7 mAP with up to 33.8X faster time.
arXiv Detail & Related papers (2023-01-10T08:15:35Z)
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving. Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation. We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z)
Radar-based Dynamic Occupancy Grid Mapping and Object Detection [55.74894405714851]
In recent years, the classical occupancy grid map approach has been extended to dynamic occupancy grid maps. This paper presents the further development of a previous approach. The data of multiple radar sensors are fused, and a grid-based object tracking and mapping method is applied.
arXiv Detail & Related papers (2020-08-09T09:26:30Z)
VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.