Related papers: Persistent Nature: A Generative Model of Unbounded 3D Worlds

Persistent Nature: A Generative Model of Unbounded 3D Worlds

URL: http://arxiv.org/abs/2303.13515v1
Date: Thu, 23 Mar 2023 17:59:40 GMT
Title: Persistent Nature: A Generative Model of Unbounded 3D Worlds
Authors: Lucy Chai, Richard Tucker, Zhengqi Li, Phillip Isola, Noah Snavely
Abstract summary: We present an extendable, planar scene layout grid that can be rendered from arbitrary camera poses via a 3D decoder and volume rendering. Based on this representation, we learn a generative world model solely from single-view internet photos. Our approach enables scene extrapolation beyond the fixed bounds of current 3D generative models, while also supporting a persistent, camera-independent world representation.
Score: 74.51149070418002
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite increasingly realistic image quality, recent 3D image generative models often operate on 3D volumes of fixed extent with limited camera motions. We investigate the task of unconditionally synthesizing unbounded nature scenes, enabling arbitrarily large camera motion while maintaining a persistent 3D world model. Our scene representation consists of an extendable, planar scene layout grid, which can be rendered from arbitrary camera poses via a 3D decoder and volume rendering, and a panoramic skydome. Based on this representation, we learn a generative world model solely from single-view internet photos. Our method enables simulating long flights through 3D landscapes, while maintaining global scene consistency--for instance, returning to the starting point yields the same view of the scene. Our approach enables scene extrapolation beyond the fixed bounds of current 3D generative models, while also supporting a persistent, camera-independent world representation that stands in contrast to auto-regressive 3D prediction models. Our project page: https://chail.github.io/persistent-nature/.

Related papers

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation [66.95956271144982]
We present Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image.<n>Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames.
arXiv Detail & Related papers (2025-06-04T17:59:04Z)
WorldExplorer: Towards Generating Fully Navigable 3D Scenes [49.21733308718443]
WorldExplorer builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints.<n>We generate multiple videos along short, pre-defined trajectories, that explore the scene in depth.<n>Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results.
arXiv Detail & Related papers (2025-06-02T15:41:31Z)
Continuous 3D Perception Model with Persistent State [111.83854602049222]
We present a unified framework capable of solving a broad range of 3D tasks. Our approach features a stateful recurrent model that continuously updates its state representation with each new observation. We evaluate our method on various 3D/4D tasks and demonstrate competitive or state-of-the-art performance in each.
arXiv Detail & Related papers (2025-01-21T18:59:23Z)
PaintScene4D: Consistent 4D Scene Generation from Text Prompts [29.075849524496707]
PaintScene4D is a novel text-to-4D scene generation framework. It harnesses video generative models trained on diverse real-world datasets. It produces realistic 4D scenes that can be viewed from arbitrary trajectories.
arXiv Detail & Related papers (2024-12-05T18:59:57Z)
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models [75.03495065452955]
We present InfiniCube, a scalable method for generating dynamic 3D driving scenes with high fidelity and controllability. Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2024-12-05T07:32:20Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes [72.02827211293736]
MagicDrive3D is a novel framework for controllable 3D street scene generation.<n>It supports multi-condition control, including road maps, 3D objects, and text descriptions.<n>It generates diverse, high-quality 3D driving scenes, supports any-view rendering, and enhances downstream tasks like BEV segmentation.
arXiv Detail & Related papers (2024-05-23T12:04:51Z)
CAT3D: Create Anything in 3D with Multi-View Diffusion Models [87.80820708758317]
We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.
arXiv Detail & Related papers (2024-05-16T17:59:05Z)
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene. Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z)
Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes. First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes. Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z)
3inGAN: Learning a 3D Generative Model from Images of a Self-similar Scene [34.2144933185175]
3inGAN is an unconditional 3D generative model trained from 2D images of a single self-similar 3D scene. We show results on semi-stochastic scenes of varying scale and complexity, obtained from real and synthetic sources.
arXiv Detail & Related papers (2022-11-27T18:03:21Z)
CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields [67.76151996543588]
We learn a 3D- and camera-aware generative model which faithfully recovers not only the image but also the camera data distribution. At test time, our model generates images with explicit control over the camera as well as the shape and appearance of the scene.
arXiv Detail & Related papers (2021-03-31T17:59:24Z)
Unsupervised object-centric video generation and decomposition in 3D [36.08064849807464]
We propose to model a video as the view seen while moving through a scene with multiple 3D objects and a 3D background. Our model is trained from monocular videos without any supervision, yet learns to generate coherent 3D scenes containing several moving objects.
arXiv Detail & Related papers (2020-07-07T18:01:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.