Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
- URL: http://arxiv.org/abs/2407.06683v1
- Date: Tue, 9 Jul 2024 08:59:27 GMT
- Title: Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
- Authors: Xunjiang Gu, Guanyu Song, Igor Gilitschenski, Marco Pavone, Boris Ivanovic,
- Abstract summary: We propose exposing the rich internal features of online map estimation methods and show how they enable more tightly integrating online mapping with trajectory forecasting.
In doing so, we find that directly accessing internal BEV features yields up to 73% faster inference speeds and up to 29% more accurate predictions on the real-world nuScenes dataset.
- Score: 30.190497345299004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding road geometry is a critical component of the autonomous vehicle (AV) stack. While high-definition (HD) maps can readily provide such information, they suffer from high labeling and maintenance costs. Accordingly, many recent works have proposed methods for estimating HD maps online from sensor data. The vast majority of recent approaches encode multi-camera observations into an intermediate representation, e.g., a bird's eye view (BEV) grid, and produce vector map elements via a decoder. While this architecture is performant, it decimates much of the information encoded in the intermediate representation, preventing downstream tasks (e.g., behavior prediction) from leveraging them. In this work, we propose exposing the rich internal features of online map estimation methods and show how they enable more tightly integrating online mapping with trajectory forecasting. In doing so, we find that directly accessing internal BEV features yields up to 73% faster inference speeds and up to 29% more accurate predictions on the real-world nuScenes dataset.
Related papers
- SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations [3.8472678261304587]
We propose a modular pipeline for vector map generation with improved generalization to sensor configurations.
By adopting a BEV semantic map robust to different sensor configurations, our proposed approach significantly improves the generalization performance.
arXiv Detail & Related papers (2024-04-30T23:45:16Z) - Producing and Leveraging Online Map Uncertainty in Trajectory Prediction [30.190497345299004]
We extend state-of-the-art online map estimation methods to additionally estimate uncertainty.
In doing so, we find that incorporating uncertainty yields up to 50% faster training convergence and up to 15% better prediction performance.
arXiv Detail & Related papers (2024-03-25T05:58:33Z) - Augmenting Lane Perception and Topology Understanding with Standard
Definition Navigation Maps [51.24861159115138]
Standard Definition (SD) maps are more affordable and have worldwide coverage, offering a scalable alternative.
We propose a novel framework to integrate SD maps into online map prediction and propose a Transformer-based encoder, SD Map Representations from transFormers.
This enhancement consistently and significantly boosts (by up to 60%) lane detection and topology prediction on current state-of-the-art online map prediction methods.
arXiv Detail & Related papers (2023-11-07T15:42:22Z) - StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map
Construction [36.1596833523566]
We present StreamMapNet, a novel online mapping pipeline adept at long-sequence temporal modeling of videos.
StreamMapNet employs multi-point attention and temporal information which empowers the construction of large-range local HD maps with high stability.
arXiv Detail & Related papers (2023-08-24T05:22:43Z) - InsMapper: Exploring Inner-instance Information for Vectorized HD
Mapping [41.59891369655983]
InsMapper harnesses inner-instance information for vectorized high-definition mapping through transformers.
InsMapper surpasses the previous state-of-the-art method, demonstrating its effectiveness and generality.
arXiv Detail & Related papers (2023-08-16T17:58:28Z) - Prior Based Online Lane Graph Extraction from Single Onboard Camera
Image [133.68032636906133]
We tackle online estimation of the lane graph from a single onboard camera image.
The prior is extracted from the dataset through a transformer based Wasserstein Autoencoder.
The autoencoder is then used to enhance the initial lane graph estimates.
arXiv Detail & Related papers (2023-07-25T08:58:26Z) - InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning [6.062751776009753]
We propose online HD map learning framework that detects HD map elements from onboard sensor observations.
InstaGraM, instance-level graph modeling of HD map brings accurate and fast end-to-end vectorized HD map learning.
Our proposed network outperforms previous models by up to 13.7 mAP with up to 33.8X faster time.
arXiv Detail & Related papers (2023-01-10T08:15:35Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting [121.42898228997538]
We propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization.
We leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph.
Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction.
arXiv Detail & Related papers (2022-11-04T16:10:50Z) - Radar-based Dynamic Occupancy Grid Mapping and Object Detection [55.74894405714851]
In recent years, the classical occupancy grid map approach has been extended to dynamic occupancy grid maps.
This paper presents the further development of a previous approach.
The data of multiple radar sensors are fused, and a grid-based object tracking and mapping method is applied.
arXiv Detail & Related papers (2020-08-09T09:26:30Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.