Editing 3D Scenes via Text Prompts without Retraining
- URL: http://arxiv.org/abs/2309.04917v3
- Date: Thu, 30 Nov 2023 03:59:28 GMT
- Title: Editing 3D Scenes via Text Prompts without Retraining
- Authors: Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding,
Shuchang Zhou, Ming-Hsuan Yang
- Abstract summary: DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities.
Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images.
Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
- Score: 80.57814031701744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerous diffusion models have recently been applied to image synthesis and
editing. However, editing 3D scenes is still in its early stages. It poses
various challenges, such as the requirement to design specific methods for
different editing types, retraining new models for various 3D scenes, and the
absence of convenient human interaction during editing. To tackle these issues,
we introduce a text-driven editing method, termed DN2N, which allows for the
direct acquisition of a NeRF model with universal editing capabilities,
eliminating the requirement for retraining. Our method employs off-the-shelf
text-based editing models of 2D images to modify the 3D scene images, followed
by a filtering process to discard poorly edited images that disrupt 3D
consistency. We then consider the remaining inconsistency as a problem of
removing noise perturbation, which can be solved by generating training data
with similar perturbation characteristics for training. We further propose
cross-view regularization terms to help the generalized NeRF model mitigate
these perturbations. Our text-driven method allows users to edit a 3D scene
with their desired description, which is more friendly, intuitive, and
practical than prior works. Empirical results show that our method achieves
multiple editing types, including but not limited to appearance editing,
weather transition, material changing, and style transfer. Most importantly,
our method generalizes well with editing abilities shared among a set of model
parameters without requiring a customized editing model for some specific
scenes, thus inferring novel views with editing effects directly from user
input. The project website is available at https://sk-fun.fun/DN2N
Related papers
- Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts [76.73043724587679]
We propose a dialogue-based 3D scene editing approach, termed CE3D.
Hash-Atlas represents 3D scene views, which transfers the editing of 3D scenes onto 2D atlas images.
Results demonstrate that CE3D effectively integrates multiple visual models to achieve diverse editing visual effects.
arXiv Detail & Related papers (2024-07-09T13:24:42Z) - ICE-G: Image Conditional Editing of 3D Gaussian Splats [45.112689255145625]
We introduce a novel approach to quickly edit a 3D model from a single reference view.
Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views.
A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner.
arXiv Detail & Related papers (2024-06-12T17:59:52Z) - Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection [60.47731445033151]
We propose a novel unified editing framework that combines the strengths of both approaches by utilizing only a basic 2D image text-to-image (T2I) diffusion model.
Experimental results confirm that our method enables editing across diverse modalities including 3D scenes, videos, and panorama images.
arXiv Detail & Related papers (2024-05-27T04:44:36Z) - View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.
By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z) - Real-time 3D-aware Portrait Editing from a Single Image [111.27169315556444]
3DPE can edit a face image following given prompts, like reference images or text descriptions.
A lightweight module is distilled from a 3D portrait generator and a text-to-image model.
arXiv Detail & Related papers (2024-02-21T18:36:26Z) - Free-Editor: Zero-shot Text-driven 3D Scene Editing [8.966537479017951]
Training a diffusion model specifically for 3D scene editing is challenging due to the scarcity of large-scale datasets.
We introduce a novel, training-free 3D scene editing technique called textscFree-Editor, which enables users to edit 3D scenes without the need for model retraining.
Our method effectively addresses the issue of multi-view style inconsistency found in state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2023-12-21T08:40:57Z) - Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt.
To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing.
For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z) - Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields [14.803266838721864]
Seal-3D allows users to edit NeRF models in a pixel-level and free manner with a wide range of NeRF-like backbone and preview the editing effects instantly.
A NeRF editing system is built to showcase various editing types.
arXiv Detail & Related papers (2023-07-27T18:08:19Z) - SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing
Field [37.8162035179377]
We present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image.
To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space.
Our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes.
arXiv Detail & Related papers (2023-03-23T13:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.