InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
- URL: http://arxiv.org/abs/2412.03934v1
- Date: Thu, 05 Dec 2024 07:32:20 GMT
- Title: InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
- Authors: Yifan Lu, Xuanchi Ren, Jiawei Yang, Tianchang Shen, Zhangjie Wu, Jun Gao, Yue Wang, Siheng Chen, Mike Chen, Sanja Fidler, Jiahui Huang,
- Abstract summary: We present InfiniCube, a scalable method for generating dynamic 3D driving scenes with high fidelity and controllability.
Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
- Score: 75.03495065452955
- License:
- Abstract: We present InfiniCube, a scalable method for generating unbounded dynamic 3D driving scenes with high fidelity and controllability. Previous methods for scene generation either suffer from limited scales or lack geometric and appearance consistency along generated sequences. In contrast, we leverage the recent advancements in scalable 3D representation and video models to achieve large dynamic scene generation that allows flexible controls through HD maps, vehicle bounding boxes, and text descriptions. First, we construct a map-conditioned sparse-voxel-based 3D generative model to unleash its power for unbounded voxel world generation. Then, we re-purpose a video model and ground it on the voxel world through a set of carefully designed pixel-aligned guidance buffers, synthesizing a consistent appearance. Finally, we propose a fast feed-forward approach that employs both voxel and pixel branches to lift the dynamic videos to dynamic 3D Gaussians with controllable objects. Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
Related papers
- DreamDrive: Generative 4D Scene Modeling from Street View Images [55.45852373799639]
We present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction.
Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references.
We then render 3D-consistent driving videos via Gaussian splatting.
arXiv Detail & Related papers (2024-12-31T18:59:57Z) - MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes [72.02827211293736]
We introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation.
Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data.
Our results demonstrate the framework's superior performance, showcasing its potential for autonomous driving simulation and beyond.
arXiv Detail & Related papers (2024-05-23T12:04:51Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Generative Rendering: Controllable 4D-Guided Video Generation with 2D
Diffusion Models [40.71940056121056]
We present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models.
We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.
arXiv Detail & Related papers (2023-12-03T14:17:11Z) - NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion
Models [85.20004959780132]
We introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments.
We show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.
arXiv Detail & Related papers (2023-04-19T16:13:21Z) - Persistent Nature: A Generative Model of Unbounded 3D Worlds [74.51149070418002]
We present an extendable, planar scene layout grid that can be rendered from arbitrary camera poses via a 3D decoder and volume rendering.
Based on this representation, we learn a generative world model solely from single-view internet photos.
Our approach enables scene extrapolation beyond the fixed bounds of current 3D generative models, while also supporting a persistent, camera-independent world representation.
arXiv Detail & Related papers (2023-03-23T17:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.