End-to-End Driving with Online Trajectory Evaluation via BEV World Model
- URL: http://arxiv.org/abs/2504.01941v2
- Date: Wed, 09 Apr 2025 12:01:43 GMT
- Title: End-to-End Driving with Online Trajectory Evaluation via BEV World Model
- Authors: Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, Zhaoxiang Zhang,
- Abstract summary: We propose an end-to-end driving framework WoTE, which leverages a BEV World model to predict future BEV states for Trajectory Evaluation.<n>We validate our framework on the NAVSIM benchmark and the closed-loop Bench2Drive benchmark based on the CARLA simulator, achieving state-of-the-art performance.
- Score: 52.10633338584164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end autonomous driving has achieved remarkable progress by integrating perception, prediction, and planning into a fully differentiable framework. Yet, to fully realize its potential, an effective online trajectory evaluation is indispensable to ensure safety. By forecasting the future outcomes of a given trajectory, trajectory evaluation becomes much more effective. This goal can be achieved by employing a world model to capture environmental dynamics and predict future states. Therefore, we propose an end-to-end driving framework WoTE, which leverages a BEV World model to predict future BEV states for Trajectory Evaluation. The proposed BEV world model is latency-efficient compared to image-level world models and can be seamlessly supervised using off-the-shelf BEV-space traffic simulators. We validate our framework on both the NAVSIM benchmark and the closed-loop Bench2Drive benchmark based on the CARLA simulator, achieving state-of-the-art performance. Code is released at https://github.com/liyingyanUCAS/WoTE.
Related papers
- Unified Human Localization and Trajectory Prediction with Monocular Vision [64.19384064365431]
MonoTransmotion is a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks.
We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs.
arXiv Detail & Related papers (2025-03-05T14:18:39Z) - ACT-Bench: Towards Action Controllable World Models for Autonomous Driving [2.6749009435602122]
World models have emerged as promising neural simulators for autonomous driving.<n>We develop an open-access evaluation framework, ACT-Bench, for quantifying action fidelity.<n>We demonstrate that the state-of-the-art model does not fully adhere to given instructions, while Terra achieves improved action fidelity.
arXiv Detail & Related papers (2024-12-06T01:06:28Z) - Imagine-2-Drive: Leveraging High-Fidelity World Models via Multi-Modal Diffusion Policies [9.639797094021988]
World Model-based Reinforcement Learning (WMRL) enables sample efficient policy learning.<n>We propose Imagine-2-Drive, a novel WMRL framework that integrates a high-fidelity world model with a multi-modal diffusion-based policy actor.<n>By training DPA within DiffDreamer, our method enables robust policy learning with minimal online interactions.
arXiv Detail & Related papers (2024-11-15T13:17:54Z) - Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving [15.100104512786107]
Drive-OccWorld adapts a visioncentric- 4D forecasting world model to end-to-end planning for autonomous driving.<n>We propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model to enable controllable generation.<n>Our method can generate plausible and controllable 4D occupancy, paving the way for advancements in driving world generation and end-to-end planning.
arXiv Detail & Related papers (2024-08-26T11:53:09Z) - Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving [4.628774934971078]
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle.
We introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models.
Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU.
arXiv Detail & Related papers (2024-08-01T08:32:03Z) - BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space [57.68134574076005]
We present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View latent space for environment modeling.
Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction.
arXiv Detail & Related papers (2024-07-08T07:26:08Z) - Enhancing End-to-End Autonomous Driving with Latent World Model [78.22157677787239]
We propose a novel self-supervised learning approach using the LAtent World model (LAW) for end-to-end driving.<n> LAW predicts future scene features based on current features and ego trajectories.<n>This self-supervised task can be seamlessly integrated into perception-free and perception-based frameworks.
arXiv Detail & Related papers (2024-06-12T17:59:21Z) - Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? [84.17711168595311]
End-to-end autonomous driving has emerged as a promising research direction to target autonomy from a full-stack perspective.
nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models.
We introduce a new metric to evaluate whether the predicted trajectories adhere to the road.
arXiv Detail & Related papers (2023-12-05T11:32:31Z) - Driving into the Future: Multiview Visual Forecasting and Planning with
World Model for Autonomous Driving [56.381918362410175]
Drive-WM is the first driving world model compatible with existing end-to-end planning models.
Our model generates high-fidelity multiview videos in driving scenes.
arXiv Detail & Related papers (2023-11-29T18:59:47Z) - Learning to Predict Vehicle Trajectories with Model-based Planning [43.27767693429292]
We introduce a novel framework called PRIME, which stands for Prediction with Model-based Planning.
Unlike recent prediction works that utilize neural networks to model scene context, PRIME is designed to generate accurate and feasibility-guaranteed future trajectory predictions.
Our PRIME outperforms state-of-the-art methods in prediction accuracy, feasibility, and robustness under imperfect tracking.
arXiv Detail & Related papers (2021-03-06T04:49:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.