Related papers: CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion

CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion

URL: http://arxiv.org/abs/2412.01792v1
Date: Mon, 02 Dec 2024 18:38:51 GMT
Title: CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion
Authors: Kai He, Chin-Hsuan Wu, Igor Gilitschenski,
Abstract summary: We introduce a novel framework that first fine-tunes the InstructPix2Pix model, followed by a two-stage optimization of the scene.<n>Our approach enables consistent and precise local edits without the need for tracking desired editing regions.<n>Compared to state-of-the-art methods, our approach offers more flexible and controllable local scene editing.
Score: 13.744253074367885
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advances in 3D representations, such as Neural Radiance Fields and 3D Gaussian Splatting, have greatly improved realistic scene modeling and novel-view synthesis. However, achieving controllable and consistent editing in dynamic 3D scenes remains a significant challenge. Previous work is largely constrained by its editing backbones, resulting in inconsistent edits and limited controllability. In our work, we introduce a novel framework that first fine-tunes the InstructPix2Pix model, followed by a two-stage optimization of the scene based on deformable 3D Gaussians. Our fine-tuning enables the model to "learn" the editing ability from a single edited reference image, transforming the complex task of dynamic scene editing into a simple 2D image editing process. By directly learning editing regions and styles from the reference, our approach enables consistent and precise local edits without the need for tracking desired editing regions, effectively addressing key challenges in dynamic scene editing. Then, our two-stage optimization progressively edits the trained dynamic scene, using a designed edited image buffer to accelerate convergence and improve temporal consistency. Compared to state-of-the-art methods, our approach offers more flexible and controllable local scene editing, achieving high-quality and consistent results.

Related papers

Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors [67.22744959435708]
3D semantic parsing often underperforms compared to its 2D counterpart, making targeted manipulations within 3D spaces more difficult and limiting the fidelity of edits.<n>We address this problem by leveraging 2D diffusion editing to accurately identify modification regions in each view, followed by inverse rendering for 3D localization.<n> Experiments demonstrate that our method achieves state-of-the-art performance while delivering up to a $4times$ speedup.
arXiv Detail & Related papers (2025-07-07T19:15:43Z)
Instruct-4DGS: Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation [25.047474784265773]
Instruct-4DGS is an efficient dynamic scene editing method that is more scalable in terms of temporal dimension. editing results demonstrate that Instruct-4DGS is efficient, reducing editing time by more than half compared to existing methods.
arXiv Detail & Related papers (2025-02-04T08:18:49Z)
Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting [55.14822004410817]
We introduce DYG, an effective 3D drag-based editing method for 3D Gaussian Splatting. It enables precise control over the extent of editing through the input of 3D masks and pairs of control points. DYG integrates the strengths of the implicit triplane representation to establish the geometric scaffold of the editing results.
arXiv Detail & Related papers (2025-01-30T18:51:54Z)
PrEditor3D: Fast and Precise 3D Shape Editing [100.09112677669376]
We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes. The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered.
arXiv Detail & Related papers (2024-12-09T15:44:47Z)
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.<n>A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.<n>This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z)
View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.<n>By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z)
Free-Editor: Zero-shot Text-driven 3D Scene Editing [8.966537479017951]
Training a diffusion model specifically for 3D scene editing is challenging due to the scarcity of large-scale datasets. We introduce a novel, training-free 3D scene editing technique called textscFree-Editor, which enables users to edit 3D scenes without the need for model retraining. Our method effectively addresses the issue of multi-view style inconsistency found in state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2023-12-21T08:40:57Z)
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt. To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing. For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z)
Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities. Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images. Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z)
SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field [37.8162035179377]
We present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image. To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space. Our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes.
arXiv Detail & Related papers (2023-03-23T13:58:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.