VAD: Vectorized Scene Representation for Efficient Autonomous Driving
- URL: http://arxiv.org/abs/2303.12077v3
- Date: Thu, 24 Aug 2023 08:15:35 GMT
- Title: VAD: Vectorized Scene Representation for Efficient Autonomous Driving
- Authors: Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong
Zhou, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang
- Abstract summary: VAD is an end-to-end vectorized paradigm for autonomous driving.
VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints.
VAD runs much faster than previous end-to-end planning methods.
- Score: 44.070636456960045
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Autonomous driving requires a comprehensive understanding of the surrounding
environment for reliable trajectory planning. Previous works rely on dense
rasterized scene representation (e.g., agent occupancy and semantic map) to
perform planning, which is computationally intensive and misses the
instance-level structure information. In this paper, we propose VAD, an
end-to-end vectorized paradigm for autonomous driving, which models the driving
scene as a fully vectorized representation. The proposed vectorized paradigm
has two significant advantages. On one hand, VAD exploits the vectorized agent
motion and map elements as explicit instance-level planning constraints which
effectively improves planning safety. On the other hand, VAD runs much faster
than previous end-to-end planning methods by getting rid of
computation-intensive rasterized representation and hand-designed
post-processing steps. VAD achieves state-of-the-art end-to-end planning
performance on the nuScenes dataset, outperforming the previous best method by
a large margin. Our base model, VAD-Base, greatly reduces the average collision
rate by 29.0% and runs 2.5x faster. Besides, a lightweight variant, VAD-Tiny,
greatly improves the inference speed (up to 9.3x) while achieving comparable
planning performance. We believe the excellent performance and the high
efficiency of VAD are critical for the real-world deployment of an autonomous
driving system. Code and models are available at https://github.com/hustvl/VAD
for facilitating future research.
Related papers
- End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation [34.070813293944944]
We propose UAD, a method for vision-based end-to-end autonomous driving (E2EAD)
Our motivation stems from the observation that current E2EAD models still mimic the modular architecture in typical driving stacks.
Our UAD achieves 38.7% relative improvements over UniAD on the average collision rate in nuScenes and surpasses VAD for 41.32 points on the driving score in CARLA's Town05 Long benchmark.
arXiv Detail & Related papers (2024-06-25T16:12:52Z) - SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation [11.011219709863875]
We propose a new end-to-end autonomous driving paradigm named SparseDrive.
SparseDrive consists of a symmetric sparse perception module and a parallel motion planner.
For motion prediction and planning, we review the great similarity between these two tasks, leading to a parallel design for motion planner.
arXiv Detail & Related papers (2024-05-30T02:13:56Z) - SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations [3.8472678261304587]
We propose a modular pipeline for vector map generation with improved generalization to sensor configurations.
By adopting a BEV semantic map robust to different sensor configurations, our proposed approach significantly improves the generalization performance.
arXiv Detail & Related papers (2024-04-30T23:45:16Z) - VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic
Planning [42.681012361021224]
VADv2 is an end-to-end driving model based on probabilistic planning.
It runs stably in a fully end-to-end manner, even without the rule-based wrapper.
arXiv Detail & Related papers (2024-02-20T18:55:09Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - Trajectory Prediction with Observations of Variable-Length for Motion
Planning in Highway Merging scenarios [5.193470362635256]
Existing methods cannot initiate prediction for a vehicle unless observed for a fixed duration of two or more seconds.
This paper proposes a novel transformer-based trajectory prediction approach, specifically trained to handle any observation length larger than one frame.
We perform a comprehensive evaluation of the proposed method using two large-scale highway trajectory datasets.
arXiv Detail & Related papers (2023-06-08T18:03:48Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting [121.42898228997538]
We propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization.
We leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph.
Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction.
arXiv Detail & Related papers (2022-11-04T16:10:50Z) - The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules.
In this paper we propose to incorporate structured priors as a loss function.
We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.