Inpaint3D: 3D Scene Content Generation using 2D Inpainting Diffusion
- URL: http://arxiv.org/abs/2312.03869v1
- Date: Wed, 6 Dec 2023 19:30:04 GMT
- Title: Inpaint3D: 3D Scene Content Generation using 2D Inpainting Diffusion
- Authors: Kira Prabhu, Jane Wu, Lynn Tsai, Peter Hedman, Dan B Goldman, Ben
Poole, Michael Broxton
- Abstract summary: This paper presents a novel approach to inpainting 3D regions of a scene, given masked multi-view images, by distilling a 2D diffusion model into a learned 3D scene representation (e.g. a NeRF)
We show that this 2D diffusion model can still serve as a generative prior in a 3D multi-view reconstruction problem where we optimize a NeRF using a combination of score distillation sampling and NeRF reconstruction losses.
Because our method can generate content to fill any 3D masked region, we additionally demonstrate 3D object completion, 3D object replacement, and 3D scene completion
- Score: 18.67196713834323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel approach to inpainting 3D regions of a scene,
given masked multi-view images, by distilling a 2D diffusion model into a
learned 3D scene representation (e.g. a NeRF). Unlike 3D generative methods
that explicitly condition the diffusion model on camera pose or multi-view
information, our diffusion model is conditioned only on a single masked 2D
image. Nevertheless, we show that this 2D diffusion model can still serve as a
generative prior in a 3D multi-view reconstruction problem where we optimize a
NeRF using a combination of score distillation sampling and NeRF reconstruction
losses. Predicted depth is used as additional supervision to encourage accurate
geometry. We compare our approach to 3D inpainting methods that focus on object
removal. Because our method can generate content to fill any 3D masked region,
we additionally demonstrate 3D object completion, 3D object replacement, and 3D
scene completion.
Related papers
- Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes [86.26588382747184]
We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes.
Based on a user-provided textual description and a 2D bounding box in a reference viewpoint, InseRF generates new objects in 3D scenes.
arXiv Detail & Related papers (2024-01-10T18:59:53Z) - NeRFiller: Completing Scenes via Generative 3D Inpainting [113.18181179986172]
We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting.
In contrast to related works, we focus on completing scenes rather than deleting foreground objects.
arXiv Detail & Related papers (2023-12-07T18:59:41Z) - ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - 3D-aware Blending with Generative NeRFs [41.10514446851655]
We propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF)
For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs, then perform 3D local alignment for each part.
To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that directly blends images on the NeRF's latent representation space, rather than raw pixel space.
arXiv Detail & Related papers (2023-02-13T18:59:52Z) - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and
Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.