Driving into the Future: Multiview Visual Forecasting and Planning with
World Model for Autonomous Driving
- URL: http://arxiv.org/abs/2311.17918v1
- Date: Wed, 29 Nov 2023 18:59:47 GMT
- Title: Driving into the Future: Multiview Visual Forecasting and Planning with
World Model for Autonomous Driving
- Authors: Yuqi Wang, Jiawei He, Lue Fan, Hongxin Li, Yuntao Chen, Zhaoxiang
Zhang
- Abstract summary: Drive-WM is the first driving world model compatible with existing end-to-end planning models.
Our model generates high-fidelity multiview videos in driving scenes.
- Score: 56.381918362410175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In autonomous driving, predicting future events in advance and evaluating the
foreseeable risks empowers autonomous vehicles to better plan their actions,
enhancing safety and efficiency on the road. To this end, we propose Drive-WM,
the first driving world model compatible with existing end-to-end planning
models. Through a joint spatial-temporal modeling facilitated by view
factorization, our model generates high-fidelity multiview videos in driving
scenes. Building on its powerful generation ability, we showcase the potential
of applying the world model for safe driving planning for the first time.
Particularly, our Drive-WM enables driving into multiple futures based on
distinct driving maneuvers, and determines the optimal trajectory according to
the image-based rewards. Evaluation on real-world driving datasets verifies
that our method could generate high-quality, consistent, and controllable
multiview videos, opening up possibilities for real-world simulations and safe
planning.
Related papers
- A Survey of World Models for Autonomous Driving [63.33363128964687]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.
This paper systematically reviews recent advances in world models for autonomous driving.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT [33.943125216555316]
We present DrivingWorld, a GPT-style world model for autonomous driving.
We propose a next-state prediction strategy to model temporal coherence between consecutive frames.
We also propose a novel masking strategy and reweighting strategy for token prediction to mitigate long-term drifting issues.
arXiv Detail & Related papers (2024-12-27T07:44:07Z) - DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers [61.92571851411509]
We introduce a multimodal driving language based on interleaved image and action tokens, and develop DrivingGPT to learn joint world modeling and planning.
Our DrivingGPT demonstrates strong performance in both action-conditioned video generation and end-to-end planning, outperforming strong baselines on large-scale nuPlan and NAVSIM benchmarks.
arXiv Detail & Related papers (2024-12-24T18:59:37Z) - DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model [65.43473733967038]
We introduce DrivingDojo, the first dataset tailor-made for training interactive world models with complex driving dynamics.
Our dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge.
arXiv Detail & Related papers (2024-10-14T17:19:23Z) - Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving [15.100104512786107]
Drive-OccWorld adapts a visioncentric- 4D forecasting world model to end-to-end planning for autonomous driving.
We propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model to enable controllable generation.
Our method can generate plausible and controllable 4D occupancy, paving the way for advancements in driving world generation and end-to-end planning.
arXiv Detail & Related papers (2024-08-26T11:53:09Z) - GenAD: Generalized Predictive Model for Autonomous Driving [75.39517472462089]
We introduce the first large-scale video prediction model in the autonomous driving discipline.
Our model, dubbed GenAD, handles the challenging dynamics in driving scenes with novel temporal reasoning blocks.
It can be adapted into an action-conditioned prediction model or a motion planner, holding great potential for real-world driving applications.
arXiv Detail & Related papers (2024-03-14T17:58:33Z) - End-to-end Interpretable Neural Motion Planner [78.69295676456085]
We propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios.
We design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations.
We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America.
arXiv Detail & Related papers (2021-01-17T14:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.