TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing
- URL: http://arxiv.org/abs/2405.14455v2
- Date: Sat, 1 Jun 2024 04:44:15 GMT
- Title: TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing
- Authors: Teng Xu, Jiamin Chen, Peng Chen, Youjia Zhang, Junqing Yu, Wei Yang,
- Abstract summary: This paper proposes a systematic approach, namely TIGER, for coherent text-instructed 3D Gaussian retrieval and editing.
To overcome the over-smoothing and inconsistency issues in editing, we propose a Coherent Score Distillation (CSD) that aggregates a 2D image editing diffusion model and a multi-view diffusion model for score distillation.
- Score: 12.50147114409895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Editing objects within a scene is a critical functionality required across a broad spectrum of applications in computer vision and graphics. As 3D Gaussian Splatting (3DGS) emerges as a frontier in scene representation, the effective modification of 3D Gaussian scenes has become increasingly vital. This process entails accurately retrieve the target objects and subsequently performing modifications based on instructions. Though available in pieces, existing techniques mainly embed sparse semantics into Gaussians for retrieval, and rely on an iterative dataset update paradigm for editing, leading to over-smoothing or inconsistency issues. To this end, this paper proposes a systematic approach, namely TIGER, for coherent text-instructed 3D Gaussian retrieval and editing. In contrast to the top-down language grounding approach for 3D Gaussians, we adopt a bottom-up language aggregation strategy to generate a denser language embedded 3D Gaussians that supports open-vocabulary retrieval. To overcome the over-smoothing and inconsistency issues in editing, we propose a Coherent Score Distillation (CSD) that aggregates a 2D image editing diffusion model and a multi-view diffusion model for score distillation, producing multi-view consistent editing with much finer details. In various experiments, we demonstrate that our TIGER is able to accomplish more consistent and realistic edits than prior work.
Related papers
- NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - GSEditPro: 3D Gaussian Splatting Editing with Attention-based Progressive Localization [11.170354299559998]
We propose GSEditPro, a novel 3D scene editing framework which allows users to perform various creative and precise editing using text prompts only.
We introduce an attention-based progressive localization module to add semantic labels to each Gaussian during rendering.
This enables precise localization on editing areas by classifying Gaussians based on their relevance to the editing prompts derived from cross-attention layers of the T2I model.
arXiv Detail & Related papers (2024-11-15T08:25:14Z) - DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation [57.406031264184584]
DragGaussian is a 3D object drag-editing framework based on 3D Gaussian Splatting.
Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.
arXiv Detail & Related papers (2024-05-09T14:34:05Z) - DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing [72.54566271694654]
We consider the problem of editing 3D objects and scenes based on open-ended language instructions.
A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process.
This process is often inefficient due to the need for iterative updates of costly 3D representations.
arXiv Detail & Related papers (2024-04-29T17:59:30Z) - View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.
By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z) - GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting [10.527349772993796]
We present GSEdit, a pipeline for text-guided 3D object editing based on Gaussian Splatting models.
Our method enables the editing of the style and appearance of 3D objects without altering their main details, all in a matter of minutes on consumer hardware.
arXiv Detail & Related papers (2024-03-08T08:42:23Z) - Gaussian Grouping: Segment and Edit Anything in 3D Scenes [65.49196142146292]
We propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.
Compared to the implicit NeRF representation, we show that the grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency.
arXiv Detail & Related papers (2023-12-01T17:09:31Z) - GaussianEditor: Swift and Controllable 3D Editing with Gaussian
Splatting [66.08674785436612]
3D editing plays a crucial role in many areas such as gaming and virtual reality.
Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes.
Our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation.
arXiv Detail & Related papers (2023-11-24T14:46:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.