Related papers: Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

URL: http://arxiv.org/abs/2408.14197v2
Date: Sat, 12 Oct 2024 06:36:28 GMT
Title: Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
Authors: Yu Yang, Jianbiao Mei, Yukai Ma, Siliang Du, Wenqing Chen, Yijie Qian, Yuxiang Feng, Yong Liu,
Abstract summary: Drive-OccWorld adapts a visioncentric- 4D forecasting world model to end-to-end planning for autonomous driving. We propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model. Experiments on the nuScenes dataset demonstrate that our method can generate plausible and controllable 4D occupancy.
Score: 15.100104512786107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: World models envision potential future states based on various ego actions. They embed extensive knowledge about the driving environment, facilitating safe and scalable autonomous driving. Most existing methods primarily focus on either data generation or the pretraining paradigms of world models. Unlike the aforementioned prior works, we propose Drive-OccWorld, which adapts a vision-centric 4D forecasting world model to end-to-end planning for autonomous driving. Specifically, we first introduce a semantic and motion-conditional normalization in the memory module, which accumulates semantic and dynamic information from historical BEV embeddings. These BEV features are then conveyed to the world decoder for future occupancy and flow forecasting, considering both geometry and spatiotemporal modeling. Additionally, we propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model to enable controllable generation and facilitate a broader range of downstream applications. Furthermore, we explore integrating the generative capabilities of the 4D world model with end-to-end planning, enabling continuous forecasting of future states and the selection of optimal trajectories using an occupancy-based cost function. Extensive experiments on the nuScenes dataset demonstrate that our method can generate plausible and controllable 4D occupancy, opening new avenues for driving world generation and end-to-end planning.

Related papers

World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model [18.56171397212777]
We present World4Drive, an end-to-end autonomous driving framework that employs vision foundation models to build latent world models.<n>World4Drive achieves state-of-the-art performance without manual perception annotations on open-loop nuScenes and closed-loop NavSim benchmarks.
arXiv Detail & Related papers (2025-07-01T09:36:38Z)
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving [20.197094443215963]
We present DriveX, a self-supervised world model that learns general scene dynamics and holistic representations from driving videos.<n>DriveX introduces Omni Scene Modeling (OSM), a module that unifies multimodal supervision-3D point cloud forecasting, 2D semantic representation, and image generation.<n>For downstream adaptation, we design Future Spatial Attention (FSA), a unified paradigm that dynamically aggregates features from DriveX's predictions to enhance task-specific inference.
arXiv Detail & Related papers (2025-05-25T17:27:59Z)
End-to-End Driving with Online Trajectory Evaluation via BEV World Model [52.10633338584164]
We propose an end-to-end driving framework WoTE, which leverages a BEV World model to predict future BEV states for Trajectory Evaluation. We validate our framework on the NAVSIM benchmark and the closed-loop Bench2Drive benchmark based on the CARLA simulator, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-04-02T17:47:23Z)
The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey [50.62538723793247]
Driving World Model (DWM) focuses on predicting scene evolution during the driving process. DWM methods enable autonomous driving systems to better perceive, understand, and interact with dynamic driving environments.
arXiv Detail & Related papers (2025-02-14T18:43:15Z)
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving [22.832008530490167]
We propose a semi-supervised vision-centric 3D occupancy world model, PreWorld, to leverage the potential of 2D labels. PreWorld achieves competitive performance across 3D occupancy prediction, 4D occupancy forecasting and motion planning tasks.
arXiv Detail & Related papers (2025-02-11T07:12:26Z)
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model [65.43473733967038]
We introduce DrivingDojo, the first dataset tailor-made for training interactive world models with complex driving dynamics. Our dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge.
arXiv Detail & Related papers (2024-10-14T17:19:23Z)
UnO: Unsupervised Occupancy Fields for Perception and Forecasting [33.205064287409094]
Supervised approaches leverage annotated object labels to learn a model of the world. We learn to perceive and forecast a continuous 4D occupancy field with self-supervision from LiDAR data. This unsupervised world model can be easily and effectively transferred to tasks.
arXiv Detail & Related papers (2024-06-12T23:22:23Z)
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving [67.46481099962088]
Current vision-centric pre-training typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task. We introduce emphcentricDriveWorld, which is capable of pre-training from multi-camera driving videos in atemporal fashion. DriveWorld delivers promising results on various autonomous driving tasks.
arXiv Detail & Related papers (2024-05-07T15:14:20Z)
GenAD: Generalized Predictive Model for Autonomous Driving [75.39517472462089]
We introduce the first large-scale video prediction model in the autonomous driving discipline. Our model, dubbed GenAD, handles the challenging dynamics in driving scenes with novel temporal reasoning blocks. It can be adapted into an action-conditioned prediction model or a motion planner, holding great potential for real-world driving applications.
arXiv Detail & Related papers (2024-03-14T17:58:33Z)
GenAD: Generative End-to-End Autonomous Driving [13.332272121018285]
GenAD is a generative framework that casts autonomous driving into a generative modeling problem. We propose an instance-centric scene tokenizer that first transforms the surrounding scenes into map-aware instance tokens. We then employ a variational autoencoder to learn the future trajectory distribution in a structural latent space for trajectory prior modeling.
arXiv Detail & Related papers (2024-02-18T08:21:05Z)
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving [56.381918362410175]
Drive-WM is the first driving world model compatible with existing end-to-end planning models. Our model generates high-fidelity multiview videos in driving scenes.
arXiv Detail & Related papers (2023-11-29T18:59:47Z)
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model. We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving. Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z)
End-to-end Interpretable Neural Motion Planner [78.69295676456085]
We propose a neural motion planner (NMP) for learning to drive autonomously in complex urban scenarios. We design a holistic model that takes as input raw LIDAR data and a HD map and produces interpretable intermediate representations. We demonstrate the effectiveness of our approach in real-world driving data captured in several cities in North America.
arXiv Detail & Related papers (2021-01-17T14:16:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.