VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic
Planning
- URL: http://arxiv.org/abs/2402.13243v1
- Date: Tue, 20 Feb 2024 18:55:09 GMT
- Title: VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic
Planning
- Authors: Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang,
Chang Huang, Wenyu Liu, Xinggang Wang
- Abstract summary: VADv2 is an end-to-end driving model based on probabilistic planning.
It runs stably in a fully end-to-end manner, even without the rule-based wrapper.
- Score: 42.681012361021224
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning a human-like driving policy from large-scale driving demonstrations
is promising, but the uncertainty and non-deterministic nature of planning make
it challenging. In this work, to cope with the uncertainty problem, we propose
VADv2, an end-to-end driving model based on probabilistic planning. VADv2 takes
multi-view image sequences as input in a streaming manner, transforms sensor
data into environmental token embeddings, outputs the probabilistic
distribution of action, and samples one action to control the vehicle. Only
with camera sensors, VADv2 achieves state-of-the-art closed-loop performance on
the CARLA Town05 benchmark, significantly outperforming all existing methods.
It runs stably in a fully end-to-end manner, even without the rule-based
wrapper. Closed-loop demos are presented at https://hgao-cv.github.io/VADv2.
Related papers
- DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Autonomous Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes dataset demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving [4.628774934971078]
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle.
We introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models.
Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU.
arXiv Detail & Related papers (2024-08-01T08:32:03Z) - BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space [57.68134574076005]
We present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View latent space for environment modeling.
Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction.
arXiv Detail & Related papers (2024-07-08T07:26:08Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - VAD: Vectorized Scene Representation for Efficient Autonomous Driving [44.070636456960045]
VAD is an end-to-end vectorized paradigm for autonomous driving.
VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints.
VAD runs much faster than previous end-to-end planning methods.
arXiv Detail & Related papers (2023-03-21T17:59:22Z) - Generating Evidential BEV Maps in Continuous Driving Space [13.073542165482566]
We propose a complete probabilistic model named GevBEV.
It interprets the 2D driving space as a probabilistic Bird's Eye View (BEV) map with point-based spatial Gaussian distributions.
GevBEV helps reduce communication overhead by selecting only the most important information to share from the learned uncertainty.
arXiv Detail & Related papers (2023-02-06T17:05:50Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.
Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z) - PillarFlow: End-to-end Birds-eye-view Flow Estimation for Autonomous
Driving [42.8479177012748]
We propose an end-to-end deep learning framework for LIDAR-based flow estimation in bird's eye view (BeV)
Our method takes consecutive point cloud pairs as input and produces a 2-D BeV flow grid describing the dynamic state of each cell.
The experimental results show that the proposed method not only estimates 2-D BeV flow accurately but also improves tracking performance of both dynamic and static objects.
arXiv Detail & Related papers (2020-08-03T20:36:28Z) - MultiXNet: Multiclass Multistage Multimodal Motion Prediction [27.046311751308775]
MultiXNet is an end-to-end approach for detection and motion prediction based directly on lidar sensor data.
The method was evaluated on large-scale, real-world data collected by a fleet of SDVs in several cities.
arXiv Detail & Related papers (2020-06-03T01:01:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.