Related papers: VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

URL: http://arxiv.org/abs/2402.13243v1
Date: Tue, 20 Feb 2024 18:55:09 GMT
Title: VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
Authors: Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang
Abstract summary: VADv2 is an end-to-end driving model based on probabilistic planning. It runs stably in a fully end-to-end manner, even without the rule-based wrapper.
Score: 42.681012361021224
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging. In this work, to cope with the uncertainty problem, we propose VADv2, an end-to-end driving model based on probabilistic planning. VADv2 takes multi-view image sequences as input in a streaming manner, transforms sensor data into environmental token embeddings, outputs the probabilistic distribution of action, and samples one action to control the vehicle. Only with camera sensors, VADv2 achieves state-of-the-art closed-loop performance on the CARLA Town05 benchmark, significantly outperforming all existing methods. It runs stably in a fully end-to-end manner, even without the rule-based wrapper. Closed-loop demos are presented at https://hgao-cv.github.io/VADv2.

Related papers

DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up. It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention. It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z)
Doe-1: Closed-Loop Autonomous Driving with Large World Model [63.99937807085461]
We propose a large Driving wOrld modEl (Doe-1) for unified perception, prediction, and planning. We use free-form texts for perception and generate future predictions directly in the RGB space with image tokens. For planning, we employ a position-aware tokenizer to effectively encode action into discrete tokens.
arXiv Detail & Related papers (2024-12-12T18:59:59Z)
Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles [9.639797094021988]
We introduce Imagine-2-Drive, a framework that consists of two components, VISTAPlan and DPA. DPA is a diffusion based policy to model multi-modal behaviors for trajectory prediction. We significantly outperform the state of the art (SOTA) world models on standard driving metrics by 15% and 20% on Route Completion and Success Rate respectively.
arXiv Detail & Related papers (2024-11-15T13:17:54Z)
Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving [4.628774934971078]
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle. We introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models. Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU.
arXiv Detail & Related papers (2024-08-01T08:32:03Z)
BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space [57.68134574076005]
We present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View latent space for environment modeling. Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction.
arXiv Detail & Related papers (2024-07-08T07:26:08Z)
Planning with Adaptive World Models for Autonomous Driving [50.4439896514353]
Motion planners (MPs) are crucial for safe navigation in complex urban environments. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic. We present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions.
arXiv Detail & Related papers (2024-06-15T18:53:45Z)
DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z)
VAD: Vectorized Scene Representation for Efficient Autonomous Driving [44.070636456960045]
VAD is an end-to-end vectorized paradigm for autonomous driving. VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints. VAD runs much faster than previous end-to-end planning methods.
arXiv Detail & Related papers (2023-03-21T17:59:22Z)
Generating Evidential BEV Maps in Continuous Driving Space [13.073542165482566]
We propose a complete probabilistic model named GevBEV. It interprets the 2D driving space as a probabilistic Bird's Eye View (BEV) map with point-based spatial Gaussian distributions. GevBEV helps reduce communication overhead by selecting only the most important information to share from the learned uncertainty.
arXiv Detail & Related papers (2023-02-06T17:05:50Z)
Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z)
IntentNet: Learning to Predict Intention from Raw Sensor Data [86.74403297781039]
In this paper, we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment. Our multi-task model achieves better accuracy than the respective separate modules while saving computation, which is critical to reducing reaction time in self-driving applications.
arXiv Detail & Related papers (2021-01-20T00:31:52Z)
PillarFlow: End-to-end Birds-eye-view Flow Estimation for Autonomous Driving [42.8479177012748]
We propose an end-to-end deep learning framework for LIDAR-based flow estimation in bird's eye view (BeV) Our method takes consecutive point cloud pairs as input and produces a 2-D BeV flow grid describing the dynamic state of each cell. The experimental results show that the proposed method not only estimates 2-D BeV flow accurately but also improves tracking performance of both dynamic and static objects.
arXiv Detail & Related papers (2020-08-03T20:36:28Z)
MultiXNet: Multiclass Multistage Multimodal Motion Prediction [27.046311751308775]
MultiXNet is an end-to-end approach for detection and motion prediction based directly on lidar sensor data. The method was evaluated on large-scale, real-world data collected by a fleet of SDVs in several cities.
arXiv Detail & Related papers (2020-06-03T01:01:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.