Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting
- URL: http://arxiv.org/abs/2404.19758v1
- Date: Tue, 30 Apr 2024 17:59:40 GMT
- Title: Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting
- Authors: Paul Engstler, Andrea Vedaldi, Iro Laina, Christian Rupprecht,
- Abstract summary: We introduce a novel depth completion model, trained via teacher distillation and self-training to learn the 3D fusion process.
We also introduce a new benchmarking scheme for scene generation methods that is based on ground truth geometry.
- Score: 75.7154104065613
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: 3D scene generation has quickly become a challenging new research direction, fueled by consistent improvements of 2D generative diffusion models. Most prior work in this area generates scenes by iteratively stitching newly generated frames with existing geometry. These works often depend on pre-trained monocular depth estimators to lift the generated images into 3D, fusing them with the existing scene representation. These approaches are then often evaluated via a text metric, measuring the similarity between the generated images and a given text prompt. In this work, we make two fundamental contributions to the field of 3D scene generation. First, we note that lifting images to 3D with a monocular depth estimation model is suboptimal as it ignores the geometry of the existing scene. We thus introduce a novel depth completion model, trained via teacher distillation and self-training to learn the 3D fusion process, resulting in improved geometric coherence of the scene. Second, we introduce a new benchmarking scheme for scene generation methods that is based on ground truth geometry, and thus measures the quality of the structure of the scene.
Related papers
- 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - Blocks2World: Controlling Realistic Scenes with Editable Primitives [5.541644538483947]
We present Blocks2World, a novel method for 3D scene rendering and editing.
Our technique begins by extracting 3D parallelepipeds from various objects in a given scene using convex decomposition.
The next stage involves training a conditioned model that learns to generate images from the 2D-rendered convex primitives.
arXiv Detail & Related papers (2023-07-07T21:38:50Z) - Neural 3D Scene Reconstruction from Multiple 2D Images without 3D
Supervision [41.20504333318276]
We propose a novel neural reconstruction method that reconstructs scenes using sparse depth under the plane constraints without 3D supervision.
We introduce a signed distance function field, a color field, and a probability field to represent a scene.
We optimize these fields to reconstruct the scene by using differentiable ray marching with accessible 2D images as supervision.
arXiv Detail & Related papers (2023-06-30T13:30:48Z) - Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models [21.622420436349245]
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.
We leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses.
In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model.
arXiv Detail & Related papers (2023-03-21T16:21:02Z) - SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections [49.802462165826554]
We present SceneDreamer, an unconditional generative model for unbounded 3D scenes.
Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations.
arXiv Detail & Related papers (2023-02-02T18:59:16Z) - 3D-Aware Indoor Scene Synthesis with Depth Priors [62.82867334012399]
Existing methods fail to model indoor scenes due to the large diversity of room layouts and the objects inside.
We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry.
arXiv Detail & Related papers (2022-02-17T09:54:29Z) - Unsupervised High-Fidelity Facial Texture Generation and Reconstruction [20.447635896077454]
We propose a novel unified pipeline for both tasks, generation of both geometry and texture, and recovery of high-fidelity texture.
Our texture model is learned, in an unsupervised fashion, from natural images as opposed to scanned texture maps.
By applying precise 3DMM fitting, we can seamlessly integrate our modeled textures into synthetically generated background images.
arXiv Detail & Related papers (2021-10-10T10:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.