GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
- URL: http://arxiv.org/abs/2503.20523v1
- Date: Wed, 26 Mar 2025 13:11:35 GMT
- Title: GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
- Authors: Lloyd Russell, Anthony Hu, Lorenzo Bertoni, George Fedoseev, Jamie Shotton, Elahe Arani, Gianluca Corrado,
- Abstract summary: GAIA-2 is a latent diffusion world model that unifies capabilities within a single generative framework.<n>GAIA-2 supports controllable video generation conditioned on a rich set of structured inputs.<n>It generates high-resolution,temporally consistent multi-camera videos across geographically diverse driving environments.
- Score: 16.101356013671833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models offer a scalable and flexible paradigm for simulating complex environments, yet current approaches fall short in addressing the domain-specific requirements of autonomous driving - such as multi-agent interactions, fine-grained control, and multi-camera consistency. We introduce GAIA-2, Generative AI for Autonomy, a latent diffusion world model that unifies these capabilities within a single generative framework. GAIA-2 supports controllable video generation conditioned on a rich set of structured inputs: ego-vehicle dynamics, agent configurations, environmental factors, and road semantics. It generates high-resolution, spatiotemporally consistent multi-camera videos across geographically diverse driving environments (UK, US, Germany). The model integrates both structured conditioning and external latent embeddings (e.g., from a proprietary driving model) to facilitate flexible and semantically grounded scene synthesis. Through this integration, GAIA-2 enables scalable simulation of both common and rare driving scenarios, advancing the use of generative world models as a core tool in the development of autonomous systems. Videos are available at https://wayve.ai/thinking/gaia-2.
Related papers
- InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation [53.47253633654885]
InstaDrive is a novel framework that enhances driving video realism through two key advancements.<n>By incorporating these instance-aware mechanisms, InstaDrive achieves state-of-the-art video generation quality.<n>Our project page is https://shanpoyang654.io/InstaDrive/page.html.
arXiv Detail & Related papers (2026-02-03T08:22:13Z) - DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving [49.11389494068169]
We present DrivingGen, the first comprehensive benchmark for generative driving world models.<n>DrivingGen combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources.<n>General models look better but break physics, while driving-specific ones capture motion realistically but lag in visual quality.
arXiv Detail & Related papers (2026-01-04T13:36:21Z) - HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving [59.55918581964678]
HybridWorldSim is a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents.<n>We release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities.
arXiv Detail & Related papers (2025-11-27T07:53:16Z) - Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models [33.32483442886097]
We propose novel statistical measures leveraging E2E drivers to evaluate the realism of generated videos.<n>We show that synthetic data produced by the video generation model offers a cost-effective alternative to real-world data collection.
arXiv Detail & Related papers (2025-10-07T17:58:32Z) - Generative AI for Autonomous Driving: Frontiers and Opportunities [145.6465312554513]
This survey delivers a comprehensive synthesis of the emerging role of GenAI across the autonomous driving stack.<n>We begin by distilling the principles and trade-offs of modern generative modeling, encompassing VAEs, GANs, Diffusion Models, and Large Language Models.<n>We categorize practical applications, such as synthetic data generalization, end-to-end driving strategies, high-fidelity digital twin systems, smart transportation networks, and cross-domain transfer to embodied AI.
arXiv Detail & Related papers (2025-05-13T17:59:20Z) - Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control [98.20899250251792]
We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs.<n>We conduct evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics2Real and autonomous vehicle data enrichment.
arXiv Detail & Related papers (2025-03-18T17:57:54Z) - A Survey of World Models for Autonomous Driving [63.33363128964687]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.
This paper systematically reviews recent advances in world models for autonomous driving.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey [61.39993881402787]
World models and video generation are pivotal technologies in the domain of autonomous driving.
This paper investigates the relationship between these two technologies.
By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions.
arXiv Detail & Related papers (2024-11-05T08:58:35Z) - GenAD: Generalized Predictive Model for Autonomous Driving [75.39517472462089]
We introduce the first large-scale video prediction model in the autonomous driving discipline.
Our model, dubbed GenAD, handles the challenging dynamics in driving scenes with novel temporal reasoning blocks.
It can be adapted into an action-conditioned prediction model or a motion planner, holding great potential for real-world driving applications.
arXiv Detail & Related papers (2024-03-14T17:58:33Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - GAIA-1: A Generative World Model for Autonomous Driving [9.578453700755318]
We introduce GAIA-1 ('Generative AI for Autonomy'), a generative world model that generates realistic driving scenarios.
Emerging properties from our model include learning high-level structures and scene dynamics, contextual awareness, generalization, and understanding of geometry.
arXiv Detail & Related papers (2023-09-29T09:20:37Z) - Isolating and Leveraging Controllable and Noncontrollable Visual
Dynamics in World Models [65.97707691164558]
We present Iso-Dream, which improves the Dream-to-Control framework in two aspects.
First, by optimizing inverse dynamics, we encourage world model to learn controllable and noncontrollable sources.
Second, we optimize the behavior of the agent on the decoupled latent imaginations of the world model.
arXiv Detail & Related papers (2022-05-27T08:07:39Z) - Cycle-Consistent World Models for Domain Independent Latent Imagination [0.0]
High costs and risks make it hard to train autonomous cars in the real world.
We propose a novel model-based reinforcement learning approach called Cycleconsistent World Models.
arXiv Detail & Related papers (2021-10-02T13:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.