InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models
- URL: http://arxiv.org/abs/2409.11734v1
- Date: Wed, 18 Sep 2024 06:43:40 GMT
- Title: InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models
- Authors: Yan Zheng, Lemeng Wu,
- Abstract summary: We introduce Geometry-Inverse-Meet-Pixel-Insert, short for GEO, an exceptionally versatile image editing technique.
Our approach seamlessly integrates text prompts and image prompts to yield diverse and precise editing outcomes.
- Score: 20.90990477016161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce Geometry-Inverse-Meet-Pixel-Insert, short for GEO, an exceptionally versatile image editing technique designed to cater to customized user requirements at both local and global scales. Our approach seamlessly integrates text prompts and image prompts to yield diverse and precise editing outcomes. Notably, our method operates without the need for training and is driven by two key contributions: (i) a novel geometric accumulation loss that enhances DDIM inversion to faithfully preserve pixel space geometry and layout, and (ii) an innovative boosted image prompt technique that combines pixel-level editing for text-only inversion with latent space geometry guidance for standard classifier-free reversion. Leveraging the publicly available Stable Diffusion model, our approach undergoes extensive evaluation across various image types and challenging prompt editing scenarios, consistently delivering high-fidelity editing results for real images.
Related papers
- TPIE: Topology-Preserved Image Editing With Text Instructions [14.399084325078878]
Topology-Preserved Image Editing with text instructions (TPIE)
TPIE treats newly generated samples as deformable variations of a given input template, allowing for controllable and structure-preserving edits.
We validate TPIE on a diverse set of 2D and 3D images and compare them with state-of-the-art image editing approaches.
arXiv Detail & Related papers (2024-11-22T22:08:27Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image.
Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image.
We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z) - Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.
We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images.
We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
arXiv Detail & Related papers (2024-02-22T18:50:18Z) - Latent Space Editing in Transformer-Based Flow Matching [53.75073756305241]
Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
arXiv Detail & Related papers (2023-12-17T21:49:59Z) - BARET : Balanced Attention based Real image Editing driven by
Target-text Inversion [36.59406959595952]
We propose a novel editing technique that only requires an input image and target text for various editing types including non-rigid edits without fine-tuning diffusion model.
Our method contains three novelties: (I) Targettext Inversion Schedule (TTIS) is designed to fine-tune the input target text embedding to achieve fast image reconstruction without image caption and acceleration of convergence; (II) Progressive Transition Scheme applies progressive linear approaches between target text embedding and its fine-tuned version to generate transition embedding for maintaining non-rigid editing capability; (III) Balanced Attention Module (BAM) balances the tradeoff between textual description and image semantics
arXiv Detail & Related papers (2023-12-09T07:18:23Z) - Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Direct Inversion: Optimization-Free Text-Driven Real Image Editing with
Diffusion Models [0.0]
We propose an optimization-free and zero fine-tuning framework that applies complex and non-rigid edits to a single real image via a text prompt.
We prove our method's efficacy in producing high-quality, diverse, semantically coherent, and faithful real image edits.
arXiv Detail & Related papers (2022-11-15T01:07:38Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.