Related papers: Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

URL: http://arxiv.org/abs/2411.02914v1
Date: Tue, 05 Nov 2024 08:58:35 GMT
Title: Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
Authors: Ao Fu, Yi Zhou, Tao Zhou, Yi Yang, Bojun Gao, Qun Li, Guobin Wu, Ling Shao,
Abstract summary: World models and video generation are pivotal technologies in the domain of autonomous driving. This paper investigates the relationship between these two technologies. By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions.
Score: 61.39993881402787
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: World models and video generation are pivotal technologies in the domain of autonomous driving, each playing a critical role in enhancing the robustness and reliability of autonomous systems. World models, which simulate the dynamics of real-world environments, and video generation models, which produce realistic video sequences, are increasingly being integrated to improve situational awareness and decision-making capabilities in autonomous vehicles. This paper investigates the relationship between these two technologies, focusing on how their structural parallels, particularly in diffusion-based models, contribute to more accurate and coherent simulations of driving scenarios. We examine leading works such as JEPA, Genie, and Sora, which exemplify different approaches to world model design, thereby highlighting the lack of a universally accepted definition of world models. These diverse interpretations underscore the field's evolving understanding of how world models can be optimized for various autonomous driving tasks. Furthermore, this paper discusses the key evaluation metrics employed in this domain, such as Chamfer distance for 3D scene reconstruction and Fr\'echet Inception Distance (FID) for assessing the quality of generated video content. By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions, emphasizing the potential of these technologies to jointly advance the performance of autonomous driving systems. The findings presented in this paper aim to provide a comprehensive understanding of how the integration of video generation and world models can drive innovation in the development of safer and more reliable autonomous vehicles.

Related papers

The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey [50.62538723793247]
Driving World Model (DWM) focuses on predicting scene evolution during the driving process. DWM methods enable autonomous driving systems to better perceive, understand, and interact with dynamic driving environments.
arXiv Detail & Related papers (2025-02-14T18:43:15Z)
A Survey of World Models for Autonomous Driving [63.33363128964687]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling. This paper systematically reviews recent advances in world models for autonomous driving.
arXiv Detail & Related papers (2025-01-20T04:00:02Z)
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model [65.43473733967038]
We introduce DrivingDojo, the first dataset tailor-made for training interactive world models with complex driving dynamics. Our dataset features video clips with a complete set of driving maneuvers, diverse multi-agent interplay, and rich open-world driving knowledge.
arXiv Detail & Related papers (2024-10-14T17:19:23Z)
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving [12.004604110512421]
Vision language models (VLMs) are emerging as revolutionary tools with significant potential to influence autonomous driving. We propose the DriveGenVLM framework to generate driving videos and use VLMs to understand them.
arXiv Detail & Related papers (2024-08-29T15:52:56Z)
Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving. Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z)
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [101.15395503285804]
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI) In this survey, we embark on a comprehensive exploration of the latest advancements in world models. We examine challenges and limitations of world models, and discuss their potential future directions.
arXiv Detail & Related papers (2024-05-06T14:37:07Z)
World Models for Autonomous Driving: An Initial Survey [16.448614804069674]
The capability to accurately predict future events and assess their implications is paramount for both safety and efficiency. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data. This paper provides an initial review of the current state and prospective advancements of world models in autonomous driving.
arXiv Detail & Related papers (2024-03-05T03:23:55Z)
Beyond One Model Fits All: Ensemble Deep Learning for Autonomous Vehicles [16.398646583844286]
This study introduces three distinct neural network models corresponding to Mediated Perception, Behavior Reflex, and Direct Perception approaches. Our architecture fuses information from the base, future latent vector prediction, and auxiliary task networks, using global routing commands to select appropriate action sub-networks.
arXiv Detail & Related papers (2023-12-10T04:40:02Z)
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving [56.381918362410175]
Drive-WM is the first driving world model compatible with existing end-to-end planning models. Our model generates high-fidelity multiview videos in driving scenes.
arXiv Detail & Related papers (2023-11-29T18:59:47Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.