EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2412.11520v1
- Date: Mon, 16 Dec 2024 07:56:04 GMT
- Title: EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
- Authors: Dong In Lee, Hyeongcheol Park, Jiyoung Seo, Eunbyung Park, Hyunje Park, Ha Dam Baek, Shin Sangheon, Sangmin kim, Sangpil Kim,
- Abstract summary: We propose textbfEditSplat, a novel 3D editing framework that integrates Multi-view Fusion Guidance (MFG) and Attention-Guided Trimming (AGT)
Our MFG ensures multi-view consistency by incorporating essential multi-view information into the diffusion process, leveraging-free guidance from the text-to-image diffusion model and the geometric properties of 3DGS.
Our AGT leverages the explicit representation of 3DGS to selectively prune and optimize 3D Gaussians, enhancing optimization efficiency and enabling precise, semantically rich local edits.
- Score: 3.9006270555948133
- License:
- Abstract: Recent advancements in 3D editing have highlighted the potential of text-driven methods in real-time, user-friendly AR/VR applications. However, current methods rely on 2D diffusion models without adequately considering multi-view information, resulting in multi-view inconsistency. While 3D Gaussian Splatting (3DGS) significantly improves rendering quality and speed, its 3D editing process encounters difficulties with inefficient optimization, as pre-trained Gaussians retain excessive source information, hindering optimization. To address these limitations, we propose \textbf{EditSplat}, a novel 3D editing framework that integrates Multi-view Fusion Guidance (MFG) and Attention-Guided Trimming (AGT). Our MFG ensures multi-view consistency by incorporating essential multi-view information into the diffusion process, leveraging classifier-free guidance from the text-to-image diffusion model and the geometric properties of 3DGS. Additionally, our AGT leverages the explicit representation of 3DGS to selectively prune and optimize 3D Gaussians, enhancing optimization efficiency and enabling precise, semantically rich local edits. Through extensive qualitative and quantitative evaluations, EditSplat achieves superior multi-view consistency and editing quality over existing methods, significantly enhancing overall efficiency.
Related papers
- 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement [66.8116563135326]
We present 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency.
Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles.
arXiv Detail & Related papers (2024-12-24T17:36:34Z) - MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields [73.49548565633123]
Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering.
Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images.
We propose a view framework based on 3D Gaussian Splatting, named MCGS, enabling scene reconstruction from sparse input views.
arXiv Detail & Related papers (2024-10-15T08:39:05Z) - MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis [22.80370814838661]
Recent works in volume rendering, textite.g. NeRF and 3D Gaussian Splatting (3DGS), significantly advance the rendering quality and efficiency.
We propose a new 3DGS optimization method embodying four key novel contributions.
arXiv Detail & Related papers (2024-10-02T23:48:31Z) - 3D Gaussian Editing with A Single Image [19.662680524312027]
We introduce a novel single-image-driven 3D scene editing approach based on 3D Gaussian Splatting.
Our method learns to optimize the 3D Gaussians to align with an edited version of the image rendered from a user-specified viewpoint.
Experiments show the effectiveness of our method in handling geometric details, long-range, and non-rigid deformation.
arXiv Detail & Related papers (2024-08-14T13:17:42Z) - TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation [35.951718189386845]
We propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS)
TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from text-to-image process.
We present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views.
arXiv Detail & Related papers (2024-07-02T08:06:58Z) - DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.
A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.
This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z) - View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.
By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z) - Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction [89.53963284958037]
We propose a novel motion-aware enhancement framework for dynamic scene reconstruction.
Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow.
For the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed.
arXiv Detail & Related papers (2024-03-18T03:46:26Z) - Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview
Correspondence-Enhanced Diffusion Models [83.97844535389073]
A major obstacle hindering the widespread adoption of 3D content editing is its time-intensive processing.
We propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated.
In most scenarios, our proposed technique brings a 10$times$ speed-up compared to the baseline method and completes the editing of a 3D scene in 2 minutes with comparable quality.
arXiv Detail & Related papers (2023-12-13T23:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.