Related papers: TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

URL: http://arxiv.org/abs/2401.14828v3
Date: Thu, 25 Apr 2024 06:54:35 GMT
Title: TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Authors: Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan,
Abstract summary: TIPEditor is a 3D scene editing framework that accepts both text and image prompts and a 3D bounding box to specify the editing region. Experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region.
Score: 119.84478647745658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box. Additionally, TIPEditor utilizes explicit and flexible 3D Gaussian splatting as the 3D representation to facilitate local editing while keeping the background unchanged. Extensive experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region, consistently outperforming the baselines in editing quality, and the alignment to the prompts, qualitatively and quantitatively.

Related papers

Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting [55.14822004410817]
We introduce DYG, an effective 3D drag-based editing method for 3D Gaussian Splatting. It enables precise control over the extent of editing through the input of 3D masks and pairs of control points. DYG integrates the strengths of the implicit triplane representation to establish the geometric scaffold of the editing results.
arXiv Detail & Related papers (2025-01-30T18:51:54Z)
DragScene: Interactive 3D Scene Editing with Single-view Drag Instructions [9.31257776760014]
3D editing has shown remarkable capability in editing scenes based on various instructions. Existing methods struggle with achieving intuitive, localized editing. We introduce DragScene, a framework that integrates drag-style editing with diverse 3D representations.
arXiv Detail & Related papers (2024-12-18T07:02:01Z)
PrEditor3D: Fast and Precise 3D Shape Editing [100.09112677669376]
We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes. The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered.
arXiv Detail & Related papers (2024-12-09T15:44:47Z)
GSEditPro: 3D Gaussian Splatting Editing with Attention-based Progressive Localization [11.170354299559998]
We propose GSEditPro, a novel 3D scene editing framework which allows users to perform various creative and precise editing using text prompts only. We introduce an attention-based progressive localization module to add semantic labels to each Gaussian during rendering. This enables precise localization on editing areas by classifying Gaussians based on their relevance to the editing prompts derived from cross-attention layers of the T2I model.
arXiv Detail & Related papers (2024-11-15T08:25:14Z)
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt. To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing. For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z)
Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types. By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences. We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z)
Cut-and-Paste: Subject-Driven Video Editing with Attention Control [47.76519877672902]
We present a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image. Compared with current methods, the whole process of our method is like cut" the source object to be edited and then " the target object provided by reference image.
arXiv Detail & Related papers (2023-11-20T12:00:06Z)
DreamEditor: Text-Driven 3D Scene Editing with Neural Fields [115.07896366760876]
We propose a novel framework that enables users to edit neural fields using text prompts. DreamEditor generates highly realistic textures and geometry, significantly surpassing previous works in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2023-06-23T11:53:43Z)
SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field [37.8162035179377]
We present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image. To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space. Our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes.
arXiv Detail & Related papers (2023-03-23T13:58:11Z)
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting. edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z)
DE-Net: Dynamic Text-guided Image Editing Adversarial Networks [82.67199573030513]
We propose a Dynamic Editing Block (DEBlock) which combines spatial- and channel-wise manipulations dynamically for various editing requirements. Our DE-Net achieves excellent performance and manipulates source images more effectively and accurately.
arXiv Detail & Related papers (2022-06-02T17:20:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.