DreamDrive: Generative 4D Scene Modeling from Street View Images
- URL: http://arxiv.org/abs/2501.00601v2
- Date: Fri, 03 Jan 2025 20:06:37 GMT
- Title: DreamDrive: Generative 4D Scene Modeling from Street View Images
- Authors: Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You, Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang,
- Abstract summary: We present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction.
Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references.
We then render 3D-consistent driving videos via Gaussian splatting.
- Score: 55.45852373799639
- License:
- Abstract: Synthesizing photo-realistic visual observations from an ego vehicle's driving trajectory is a critical step towards scalable training of self-driving models. Reconstruction-based methods create 3D scenes from driving logs and synthesize geometry-consistent driving videos through neural rendering, but their dependence on costly object annotations limits their ability to generalize to in-the-wild driving scenarios. On the other hand, generative models can synthesize action-conditioned driving videos in a more generalizable way but often struggle with maintaining 3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction, to synthesize generalizable 4D driving scenes and dynamic driving videos with 3D consistency. Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references and further elevate them to 4D with a novel hybrid Gaussian representation. Given a driving trajectory, we then render 3D-consistent driving videos via Gaussian splatting. The use of generative priors allows our method to produce high-quality 4D scenes from in-the-wild driving data, while neural rendering ensures 3D-consistent video generation from the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate that DreamDrive can generate controllable and generalizable 4D driving scenes, synthesize novel views of driving videos with high fidelity and 3D consistency, decompose static and dynamic elements in a self-supervised manner, and enhance perception and planning tasks for autonomous driving.
Related papers
- Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model [83.31688383891871]
We propose a Spatial-Temporal simulAtion for drivinG (Stag-1) model to reconstruct real-world scenes.
Stag-1 constructs continuous 4D point cloud scenes using surround-view data from autonomous vehicles.
It decouples spatial-temporal relationships and produces coherent driving videos.
arXiv Detail & Related papers (2024-12-06T18:59:56Z) - PaintScene4D: Consistent 4D Scene Generation from Text Prompts [29.075849524496707]
PaintScene4D is a novel text-to-4D scene generation framework.
It harnesses video generative models trained on diverse real-world datasets.
It produces realistic 4D scenes that can be viewed from arbitrary trajectories.
arXiv Detail & Related papers (2024-12-05T18:59:57Z) - InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models [75.03495065452955]
We present InfiniCube, a scalable method for generating dynamic 3D driving scenes with high fidelity and controllability.
Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2024-12-05T07:32:20Z) - OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving [62.54220021308464]
We propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving.
OccSora can generate 16s-videos with authentic 3D layout and temporal consistency, demonstrating its ability to understand the spatial and temporal distributions of driving scenes.
arXiv Detail & Related papers (2024-05-30T17:59:42Z) - MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes [72.02827211293736]
We introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation.
Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data.
Our results demonstrate the framework's superior performance, showcasing its potential for autonomous driving simulation and beyond.
arXiv Detail & Related papers (2024-05-23T12:04:51Z) - TC4D: Trajectory-Conditioned Text-to-4D Generation [94.90700997568158]
We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components.
We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.
Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
arXiv Detail & Related papers (2024-03-26T17:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.