Free-Editor: Zero-shot Text-driven 3D Scene Editing
- URL: http://arxiv.org/abs/2312.13663v2
- Date: Sun, 14 Jul 2024 03:52:51 GMT
- Title: Free-Editor: Zero-shot Text-driven 3D Scene Editing
- Authors: Nazmul Karim, Hasan Iqbal, Umar Khalid, Jing Hua, Chen Chen,
- Abstract summary: Training a diffusion model specifically for 3D scene editing is challenging due to the scarcity of large-scale datasets.
We introduce a novel, training-free 3D scene editing technique called textscFree-Editor, which enables users to edit 3D scenes without the need for model retraining.
Our method effectively addresses the issue of multi-view style inconsistency found in state-of-the-art (SOTA) methods.
- Score: 8.966537479017951
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-Image (T2I) diffusion models have recently gained traction for their versatility and user-friendliness in 2D content generation and editing. However, training a diffusion model specifically for 3D scene editing is challenging due to the scarcity of large-scale datasets. Currently, editing 3D scenes necessitates either retraining the model to accommodate various 3D edits or developing specific methods tailored to each unique editing type. Moreover, state-of-the-art (SOTA) techniques require multiple synchronized edited images from the same scene to enable effective scene editing. Given the current limitations of T2I models, achieving consistent editing effects across multiple images remains difficult, leading to multi-view inconsistency in editing. This inconsistency undermines the performance of 3D scene editing when these images are utilized. In this study, we introduce a novel, training-free 3D scene editing technique called \textsc{Free-Editor}, which enables users to edit 3D scenes without the need for model retraining during the testing phase. Our method effectively addresses the issue of multi-view style inconsistency found in state-of-the-art (SOTA) methods through the implementation of a single-view editing scheme. Specifically, we demonstrate that editing a particular 3D scene can be achieved by modifying only a single view. To facilitate this, we present an Edit Transformer that ensures intra-view consistency and inter-view style transfer using self-view and cross-view attention mechanisms, respectively. By eliminating the need for model retraining and multi-view editing, our approach significantly reduces editing time and memory resource requirements, achieving runtimes approximately 20 times faster than SOTA methods. We have performed extensive experiments on various benchmark datasets, showcasing the diverse editing capabilities of our proposed technique.
Related papers
- ICE-G: Image Conditional Editing of 3D Gaussian Splats [45.112689255145625]
We introduce a novel approach to quickly edit a 3D model from a single reference view.
Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views.
A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner.
arXiv Detail & Related papers (2024-06-12T17:59:52Z) - Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection [60.47731445033151]
We propose a novel unified editing framework that combines the strengths of both approaches by utilizing only a basic 2D image text-to-image (T2I) diffusion model.
Experimental results confirm that our method enables editing across diverse modalities including 3D scenes, videos, and panorama images.
arXiv Detail & Related papers (2024-05-27T04:44:36Z) - View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.
By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z) - Real-time 3D-aware Portrait Editing from a Single Image [111.27169315556444]
3DPE can edit a face image following given prompts, like reference images or text descriptions.
A lightweight module is distilled from a 3D portrait generator and a text-to-image model.
arXiv Detail & Related papers (2024-02-21T18:36:26Z) - Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview
Correspondence-Enhanced Diffusion Models [83.97844535389073]
A major obstacle hindering the widespread adoption of 3D content editing is its time-intensive processing.
We propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.
In most scenarios, our proposed technique brings a 10$times$ speed-up compared to the baseline method and completes the editing of a 3D scene in 2 minutes with comparable quality.
arXiv Detail & Related papers (2023-12-13T23:27:17Z) - Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt.
To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing.
For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z) - Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities.
Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images.
Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z) - SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing
Field [37.8162035179377]
We present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image.
To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space.
Our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes.
arXiv Detail & Related papers (2023-03-23T13:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.