Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview
Correspondence-Enhanced Diffusion Models
- URL: http://arxiv.org/abs/2312.08563v2
- Date: Tue, 26 Dec 2023 21:30:14 GMT
- Title: Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview
Correspondence-Enhanced Diffusion Models
- Authors: Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan,
Hao Tang
- Abstract summary: A major obstacle hindering the widespread adoption of 3D content editing is its time-intensive processing.
We propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.
In most scenarios, our proposed technique brings a 10$times$ speed-up compared to the baseline method and completes the editing of a 3D scene in 2 minutes with comparable quality.
- Score: 83.97844535389073
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The advancement of text-driven 3D content editing has been blessed by the
progress from 2D generative diffusion models. However, a major obstacle
hindering the widespread adoption of 3D content editing is its time-intensive
processing. This challenge arises from the iterative and refining steps
required to achieve consistent 3D outputs from 2D image-based generative
models. Recent state-of-the-art methods typically require optimization time
ranging from tens of minutes to several hours to edit a 3D scene using a single
GPU. In this work, we propose that by incorporating correspondence
regularization into diffusion models, the process of 3D editing can be
significantly accelerated. This approach is inspired by the notion that the
estimated samples during diffusion should be multiview-consistent during the
diffusion generation process. By leveraging this multiview consistency, we can
edit 3D content at a much faster speed. In most scenarios, our proposed
technique brings a 10$\times$ speed-up compared to the baseline method and
completes the editing of a 3D scene in 2 minutes with comparable quality.
Related papers
- DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation [17.930032337081673]
Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks.
We propose DreamCatalyst, a novel framework that considers these sampling dynamics in the SDS framework.
Our method offers two modes: (1) a fast mode that edits scenes 23 times faster than current state-of-the-art NeRF editing methods, and (2) a high-quality mode that produces superior results about 8 times faster than these methods.
arXiv Detail & Related papers (2024-07-16T05:26:14Z) - 3DEgo: 3D Editing on the Go! [6.072473323242202]
We introduce 3DEgo to address a novel problem of directly synthesizing 3D scenes from monocular videos guided by textual prompts.
Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow.
3DEgo demonstrates remarkable editing precision, speed, and adaptability across a variety of video sources.
arXiv Detail & Related papers (2024-07-14T07:03:50Z) - DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.
A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.
This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z) - Generic 3D Diffusion Adapter Using Controlled Multi-View Editing [44.99706994361726]
Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity.
This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images.
MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation.
arXiv Detail & Related papers (2024-03-18T17:59:09Z) - Real-time 3D-aware Portrait Editing from a Single Image [111.27169315556444]
3DPE can edit a face image following given prompts, like reference images or text descriptions.
A lightweight module is distilled from a 3D portrait generator and a text-to-image model.
arXiv Detail & Related papers (2024-02-21T18:36:26Z) - Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z) - 3DDesigner: Towards Photorealistic 3D Object Generation and Editing with
Text-guided Diffusion Models [71.25937799010407]
We equip text-guided diffusion models to achieve 3D-consistent generation.
We study 3D local editing and propose a two-step solution.
We extend our model to perform one-shot novel view synthesis.
arXiv Detail & Related papers (2022-11-25T13:50:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.