NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion
Models
- URL: http://arxiv.org/abs/2304.09787v1
- Date: Wed, 19 Apr 2023 16:13:21 GMT
- Title: NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion
Models
- Authors: Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja
Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler
- Abstract summary: We introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments.
We show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.
- Score: 85.20004959780132
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically generating high-quality real world 3D scenes is of enormous
interest for applications such as virtual reality and robotics simulation.
Towards this goal, we introduce NeuralField-LDM, a generative model capable of
synthesizing complex 3D environments. We leverage Latent Diffusion Models that
have been successfully utilized for efficient high-quality 2D content creation.
We first train a scene auto-encoder to express a set of image and pose pairs as
a neural field, represented as density and feature voxel grids that can be
projected to produce novel views of the scene. To further compress this
representation, we train a latent-autoencoder that maps the voxel grids to a
set of latent representations. A hierarchical diffusion model is then fit to
the latents to complete the scene generation pipeline. We achieve a substantial
improvement over existing state-of-the-art scene generation models.
Additionally, we show how NeuralField-LDM can be used for a variety of 3D
content creation applications, including conditional scene generation, scene
inpainting and scene style manipulation.
Related papers
- LT3SD: Latent Trees for 3D Scene Diffusion [71.91446143124648]
We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation.
We demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation.
arXiv Detail & Related papers (2024-09-12T16:55:51Z) - Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE [22.072200443502457]
We propose Scene123, a 3D scene generation model that ensures realism and diversity through the video generation framework.
Specifically, we warp the input image (or an image generated from text) to simulate adjacent views, filling the invisible areas with the MAE model.
To further enhance the details and texture fidelity of generated views, we employ a GAN-based Loss against images derived from the input image through the video generation model.
arXiv Detail & Related papers (2024-08-10T08:09:57Z) - Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models [3.9373541926236766]
We present a latent diffusion model over 3D scenes, that can be trained using only 2D image data.
We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, or from sparse input views.
arXiv Detail & Related papers (2024-06-18T23:14:29Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques.
Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner.
Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.