Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion
Model
- URL: http://arxiv.org/abs/2306.09551v1
- Date: Thu, 15 Jun 2023 23:41:58 GMT
- Title: Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion
Model
- Authors: Lu Yu, Wei Xiang, Kang Han
- Abstract summary: Combination of pretrained diffusion models with neural radiance fields (NeRFs) has emerged as a promising approach for text-to-3D generation.
We propose the Edit-DiffNeRF framework, which is composed of a frozen diffusion model, a proposed delta module to edit the latent semantic space of the diffusion model, and a NeRF.
Our proposed method has been shown to effectively edit real-world 3D scenes, resulting in 25% improvement in the alignment of the performed 3D edits with text instructions.
- Score: 11.05302598034426
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent research has demonstrated that the combination of pretrained diffusion
models with neural radiance fields (NeRFs) has emerged as a promising approach
for text-to-3D generation. Simply coupling NeRF with diffusion models will
result in cross-view inconsistency and degradation of stylized view syntheses.
To address this challenge, we propose the Edit-DiffNeRF framework, which is
composed of a frozen diffusion model, a proposed delta module to edit the
latent semantic space of the diffusion model, and a NeRF. Instead of training
the entire diffusion for each scene, our method focuses on editing the latent
semantic space in frozen pretrained diffusion models by the delta module. This
fundamental change to the standard diffusion framework enables us to make
fine-grained modifications to the rendered views and effectively consolidate
these instructions in a 3D scene via NeRF training. As a result, we are able to
produce an edited 3D scene that faithfully aligns to input text instructions.
Furthermore, to ensure semantic consistency across different viewpoints, we
propose a novel multi-view semantic consistency loss that extracts a latent
semantic embedding from the input view as a prior, and aim to reconstruct it in
different views. Our proposed method has been shown to effectively edit
real-world 3D scenes, resulting in 25% improvement in the alignment of the
performed 3D edits with text instructions compared to prior work.
Related papers
- NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows [60.291277312569285]
We present a method for automatically modifying a NeRF representation based on a single observation.
Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations.
We also introduce a new dataset for exploring the problem of modifying a NeRF scene through a single observation.
arXiv Detail & Related papers (2024-06-15T07:58:08Z) - Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation [28.079441901818296]
We propose a text-to-3D method for Neural Radiance Fields (NeRFs) that explicitly enforces fine-grained view consistency.
Our method achieves state-of-the-art performance over existing text-to-3D methods.
arXiv Detail & Related papers (2023-12-19T01:09:49Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models [36.236190350126826]
We propose a novel framework that can take RGB images as input and alter the 3D content in neural scenes.
Specifically, we semantically select the target object and a pre-trained diffusion model will guide the NeRF model to generate new 3D objects.
Experiment results show that our algorithm is effective for editing 3D objects in NeRF under different text prompts.
arXiv Detail & Related papers (2023-06-09T04:49:31Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and
Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z) - Vox-E: Text-guided Voxel Editing of 3D Objects [14.88446525549421]
Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images.
We present a technique that harnesses the power of latent diffusion models for editing existing 3D objects.
arXiv Detail & Related papers (2023-03-21T17:36:36Z) - NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from
3D-aware Diffusion [107.67277084886929]
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.
We propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time.
We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views.
arXiv Detail & Related papers (2023-02-20T17:12:00Z) - NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as
General Image Priors [24.05480789681139]
We propose NeRDi, a single-view NeRF synthesis framework with general image priors from 2D diffusion models.
We leverage off-the-shelf vision-language models and introduce a two-section language guidance as conditioning inputs to the diffusion model.
We also demonstrate our generalizability in zero-shot NeRF synthesis for in-the-wild images.
arXiv Detail & Related papers (2022-12-06T19:00:07Z) - 3DDesigner: Towards Photorealistic 3D Object Generation and Editing with
Text-guided Diffusion Models [71.25937799010407]
We equip text-guided diffusion models to achieve 3D-consistent generation.
We study 3D local editing and propose a two-step solution.
We extend our model to perform one-shot novel view synthesis.
arXiv Detail & Related papers (2022-11-25T13:50:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.