Related papers: GenAD: Generalized Predictive Model for Autonomous Driving

GenAD: Generalized Predictive Model for Autonomous Driving

URL: http://arxiv.org/abs/2403.09630v2
Date: Thu, 8 Aug 2024 11:38:21 GMT
Title: GenAD: Generalized Predictive Model for Autonomous Driving
Authors: Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li,
Abstract summary: We introduce the first large-scale video prediction model in the autonomous driving discipline. Our model, dubbed GenAD, handles the challenging dynamics in driving scenes with novel temporal reasoning blocks. It can be adapted into an action-conditioned prediction model or a motion planner, holding great potential for real-world driving applications.
Score: 75.39517472462089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce the first large-scale video prediction model in the autonomous driving discipline. To eliminate the restriction of high-cost data collection and empower the generalization ability of our model, we acquire massive data from the web and pair it with diverse and high-quality text descriptions. The resultant dataset accumulates over 2000 hours of driving videos, spanning areas all over the world with diverse weather conditions and traffic scenarios. Inheriting the merits from recent latent diffusion models, our model, dubbed GenAD, handles the challenging dynamics in driving scenes with novel temporal reasoning blocks. We showcase that it can generalize to various unseen driving datasets in a zero-shot manner, surpassing general or driving-specific video prediction counterparts. Furthermore, GenAD can be adapted into an action-conditioned prediction model or a motion planner, holding great potential for real-world driving applications.

Related papers

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving [49.11389494068169]
We present DrivingGen, the first comprehensive benchmark for generative driving world models.<n>DrivingGen combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources.<n>General models look better but break physics, while driving-specific ones capture motion realistically but lag in visual quality.
arXiv Detail & Related papers (2026-01-04T13:36:21Z)
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving [20.197094443215963]
We present DriveX, a self-supervised world model that learns general scene dynamics and holistic representations from driving videos.<n>DriveX introduces Omni Scene Modeling (OSM), a module that unifies multimodal supervision-3D point cloud forecasting, 2D semantic representation, and image generation.<n>For downstream adaptation, we design Future Spatial Attention (FSA), a unified paradigm that dynamically aggregates features from DriveX's predictions to enhance task-specific inference.
arXiv Detail & Related papers (2025-05-25T17:27:59Z)
The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey [50.62538723793247]
Driving World Model (DWM) focuses on predicting scene evolution during the driving process. DWM methods enable autonomous driving systems to better perceive, understand, and interact with dynamic driving environments.
arXiv Detail & Related papers (2025-02-14T18:43:15Z)
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT [33.943125216555316]
We present DrivingWorld, a GPT-style world model for autonomous driving. We propose a next-state prediction strategy to model temporal coherence between consecutive frames. We also propose a novel masking strategy and reweighting strategy for token prediction to mitigate long-term drifting issues.
arXiv Detail & Related papers (2024-12-27T07:44:07Z)
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers [61.92571851411509]
We introduce a multimodal driving language based on interleaved image and action tokens, and develop DrivingGPT to learn joint world modeling and planning. Our DrivingGPT demonstrates strong performance in both action-conditioned video generation and end-to-end planning, outperforming strong baselines on large-scale nuPlan and NAVSIM benchmarks.
arXiv Detail & Related papers (2024-12-24T18:59:37Z)
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving [63.882827922267666]
DriveMM is a large multimodal model designed to process diverse data inputs, such as images and multi-view videos, while performing a broad spectrum of autonomous driving tasks. We conduct evaluations on six public benchmarks and undertake zero-shot transfer on an unseen dataset, where DriveMM achieves state-of-the-art performance across all tasks.
arXiv Detail & Related papers (2024-12-10T17:27:32Z)
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model [65.43473733967038]
We introduce DrivingDojo, the first dataset tailor-made for training interactive world models with complex driving dynamics. Our dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge.
arXiv Detail & Related papers (2024-10-14T17:19:23Z)
GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model [6.144680854063938]
GenDDS is a novel approach for generating driving scenarios for autonomous driving systems. We employ the KITTI dataset, which includes real-world driving videos, to train the model. We demonstrate that our model can generate high-quality driving videos that closely replicate the complexity and variability of real-world driving scenarios.
arXiv Detail & Related papers (2024-08-28T15:37:44Z)
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving [15.100104512786107]
Drive-OccWorld adapts a visioncentric- 4D forecasting world model to end-to-end planning for autonomous driving. We propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model. Experiments on the nuScenes dataset demonstrate that our method can generate plausible and controllable 4D occupancy.
arXiv Detail & Related papers (2024-08-26T11:53:09Z)
Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past. We leverage the large-scale pretraining of image diffusion models which can handle multi-modality. We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z)
GenAD: Generative End-to-End Autonomous Driving [13.332272121018285]
GenAD is a generative framework that casts autonomous driving into a generative modeling problem. We propose an instance-centric scene tokenizer that first transforms the surrounding scenes into map-aware instance tokens. We then employ a variational autoencoder to learn the future trajectory distribution in a structural latent space for trajectory prior modeling.
arXiv Detail & Related papers (2024-02-18T08:21:05Z)
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving [56.381918362410175]
Drive-WM is the first driving world model compatible with existing end-to-end planning models. Our model generates high-fidelity multiview videos in driving scenes.
arXiv Detail & Related papers (2023-11-29T18:59:47Z)
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model. We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving. Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z)
One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available. We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.