SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections
- URL: http://arxiv.org/abs/2302.01330v3
- Date: Thu, 7 Dec 2023 18:58:30 GMT
- Title: SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections
- Authors: Zhaoxi Chen, Guangcong Wang, Ziwei Liu
- Abstract summary: We present SceneDreamer, an unconditional generative model for unbounded 3D scenes.
Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations.
- Score: 49.802462165826554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present SceneDreamer, an unconditional generative model for
unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random
noise. Our framework is learned from in-the-wild 2D image collections only,
without any 3D annotations. At the core of SceneDreamer is a principled
learning paradigm comprising 1) an efficient yet expressive 3D scene
representation, 2) a generative scene parameterization, and 3) an effective
renderer that can leverage the knowledge from 2D images. Our approach begins
with an efficient bird's-eye-view (BEV) representation generated from simplex
noise, which includes a height field for surface elevation and a semantic field
for detailed scene semantics. This BEV scene representation enables 1)
representing a 3D scene with quadratic complexity, 2) disentangled geometry and
semantics, and 3) efficient training. Moreover, we propose a novel generative
neural hash grid to parameterize the latent space based on 3D positions and
scene semantics, aiming to encode generalizable features across various scenes.
Lastly, a neural volumetric renderer, learned from 2D image collections through
adversarial training, is employed to produce photorealistic images. Extensive
experiments demonstrate the effectiveness of SceneDreamer and superiority over
state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.
Related papers
- Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches [50.51643519253066]
3D Content Generation is at the heart of many computer graphics applications, including video gaming, film-making, virtual and augmented reality, etc.
This paper proposes a novel deep-learning based approach for automatically generating interactive and playable 3D game scenes.
arXiv Detail & Related papers (2024-08-08T16:27:37Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D
Scene Generation [96.58789785954409]
We propose a practical and efficient 3D representation that incorporates an equivariant radiance field with the guidance of a bird's-eye view map.
We produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.
arXiv Detail & Related papers (2023-12-04T18:56:10Z) - Blocks2World: Controlling Realistic Scenes with Editable Primitives [5.541644538483947]
We present Blocks2World, a novel method for 3D scene rendering and editing.
Our technique begins by extracting 3D parallelepipeds from various objects in a given scene using convex decomposition.
The next stage involves training a conditioned model that learns to generate images from the 2D-rendered convex primitives.
arXiv Detail & Related papers (2023-07-07T21:38:50Z) - CLIP-Guided Vision-Language Pre-training for Question Answering in 3D
Scenes [68.61199623705096]
We design a novel 3D pre-training Vision-Language method that helps a model learn semantically meaningful and transferable 3D scene point cloud representations.
We inject the representational power of the popular CLIP model into our 3D encoder by aligning the encoded 3D scene features with the corresponding 2D image and text embeddings.
We evaluate our model's 3D world reasoning capability on the downstream task of 3D Visual Question Answering.
arXiv Detail & Related papers (2023-04-12T16:52:29Z) - 3inGAN: Learning a 3D Generative Model from Images of a Self-similar
Scene [34.2144933185175]
3inGAN is an unconditional 3D generative model trained from 2D images of a single self-similar 3D scene.
We show results on semi-stochastic scenes of varying scale and complexity, obtained from real and synthetic sources.
arXiv Detail & Related papers (2022-11-27T18:03:21Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.