Related papers: Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information

Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information

URL: http://arxiv.org/abs/2503.11601v1
Date: Fri, 14 Mar 2025 17:15:26 GMT
Title: Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information
Authors: Xuanqi Zhang, Jieun Lee, Chris Joslin, Wonsook Lee,
Abstract summary: We present a novel framework for enhancing the visual fidelity and consistency of text-guided 3D Gaussian Splatting (3DGS) editing.<n>Our method demonstrates superior performance in rendering quality and view consistency compared to state-of-the-art approaches.
Score: 4.956066467858058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a novel framework for enhancing the visual fidelity and consistency of text-guided 3D Gaussian Splatting (3DGS) editing. Existing editing approaches face two critical challenges: inconsistent geometric reconstructions across multiple viewpoints, particularly in challenging camera positions, and ineffective utilization of depth information during image manipulation, resulting in over-texture artifacts and degraded object boundaries. To address these limitations, we introduce: 1) A complementary information mutual learning network that enhances depth map estimation from 3DGS, enabling precise depth-conditioned 3D editing while preserving geometric structures. 2) A wavelet consensus attention mechanism that effectively aligns latent codes during the diffusion denoising process, ensuring multi-view consistency in the edited results. Through extensive experimentation, our method demonstrates superior performance in rendering quality and view consistency compared to state-of-the-art approaches. The results validate our framework as an effective solution for text-guided editing of 3D scenes.

Related papers

Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors [67.22744959435708]
3D semantic parsing often underperforms compared to its 2D counterpart, making targeted manipulations within 3D spaces more difficult and limiting the fidelity of edits.<n>We address this problem by leveraging 2D diffusion editing to accurately identify modification regions in each view, followed by inverse rendering for 3D localization.<n> Experiments demonstrate that our method achieves state-of-the-art performance while delivering up to a $4times$ speedup.
arXiv Detail & Related papers (2025-07-07T19:15:43Z)
InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior [3.6657066043195585]
InterGSEdit is a novel framework for high-quality 3DGS editing via interactively selecting key views with users' preferences.<n>We project $GAP3D$ to obtain 3D-constrained attention, which are fused with 2D cross-attention via Attention Fusion Network (AFN)
arXiv Detail & Related papers (2025-07-07T13:04:26Z)
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting [3.9006270555948133]
We propose EditSplat, a text-driven 3D scene editing framework that integrates Multi-view Fusion Guidance (MFG) and Attention-Guided Trimming (AGT) Our MFG ensures multi-view consistency by incorporating essential multi-view information into the diffusion process. Our AGT utilizes the explicit representation of 3DGS to selectively prune and optimize 3D Gaussians, enhancing optimization efficiency and enabling precise, semantically rich local editing.
arXiv Detail & Related papers (2024-12-16T07:56:04Z)
TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation [35.951718189386845]
We propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from text-to-image process. We present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views.
arXiv Detail & Related papers (2024-07-02T08:06:58Z)
SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing [58.22339174221563]
We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing. SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent. Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures.
arXiv Detail & Related papers (2024-06-25T09:17:35Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception. Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions. A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process. This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z)
Reference-Based 3D-Aware Image Editing with Triplanes [15.222454412573455]
Generative Adversarial Networks (GANs) have emerged as powerful tools for high-quality image generation and real image editing by manipulating their latent spaces. Recent advancements in GANs include 3D-aware models such as EG3D, which feature efficient triplane-based architectures capable of reconstructing 3D geometry from single images. This study addresses this gap by exploring and demonstrating the effectiveness of the triplane space for advanced reference-based edits.
arXiv Detail & Related papers (2024-04-04T17:53:33Z)
View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.<n>By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z)
GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing [38.948892064761914]
GaussCtrl is a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS) Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image.
arXiv Detail & Related papers (2024-03-13T17:35:28Z)
Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views. We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images. We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
arXiv Detail & Related papers (2024-02-22T18:50:18Z)
SERF: Fine-Grained Interactive 3D Segmentation and Editing with Radiance Fields [92.14328581392633]
We introduce a novel fine-grained interactive 3D segmentation and editing algorithm with radiance fields, which we refer to as SERF. Our method entails creating a neural mesh representation by integrating multi-view algorithms with pre-trained 2D models. Building upon this representation, we introduce a novel surface rendering technique that preserves local information and is robust to deformation.
arXiv Detail & Related papers (2023-12-26T02:50:42Z)
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis [38.517819699560945]
Our system consists of three major components: (1) a 3D-semantics-aware generative model that produces view-consistent, disentangled face images and semantic masks; (2) a hybrid GAN inversion approach that initializes the latent codes from the semantic and texture encoder, and further optimized them for faithful reconstruction; and (3) a canonical editor that enables efficient manipulation of semantic masks in canonical view and product high-quality editing results.
arXiv Detail & Related papers (2022-05-31T03:35:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.