Exploring Iterative Manifold Constraint for Zero-shot Image Editing
- URL: http://arxiv.org/abs/2501.03631v2
- Date: Tue, 11 Feb 2025 09:26:43 GMT
- Title: Exploring Iterative Manifold Constraint for Zero-shot Image Editing
- Authors: Maomao Li, Yu Li, Yunfei Liu, Dong Xu,
- Abstract summary: We propose a novel zero-shot editing paradigm dubbed ZZEdit.
It locates a qualified intermediate-inverted latent marked as $z_p$ as a better editing pivot.
Our ZZEdit performs iterative manifold constraint between the manifold of $M_p$ and $M_p-1$, leading to fewer fidelity errors.
- Score: 38.7483790652481
- License:
- Abstract: Editability and fidelity are two essential demands for text-driven image editing, which expects that the editing area should align with the target prompt and the rest remain unchanged separately. The current cutting-edge editing methods usually obey an "inversion-then-editing" pipeline, where the input image is inverted to an approximate Gaussian noise ${z}_T$, based on which a sampling process is conducted using the target prompt. Nevertheless, we argue that it is not a good choice to use a near-Gaussian noise as a pivot for further editing since it would bring plentiful fidelity errors. We verify this by a pilot analysis, discovering that intermediate-inverted latents can achieve a better trade-off between editability and fidelity than the fully-inverted ${z}_T$. Based on this, we propose a novel zero-shot editing paradigm dubbed ZZEdit, which first locates a qualified intermediate-inverted latent marked as ${z}_p$ as a better editing pivot, which is sufficient-for-editing while structure-preserving. Then, a ZigZag process is designed to execute denoising and inversion alternately, which progressively inject target guidance to ${z}_p$ while preserving the structure information of $p$ step. Afterwards, to achieve the same step number of inversion and denoising, we execute a pure sampling process under the target prompt. Essentially, our ZZEdit performs iterative manifold constraint between the manifold of $M_{p}$ and $M_{p-1}$, leading to fewer fidelity errors. Extensive experiments highlight the effectiveness of ZZEdit in diverse image editing scenarios compared with the "inversion-then-editing" pipeline.
Related papers
- Lost in Edits? A $λ$-Compass for AIGC Provenance [119.95562081325552]
We propose a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones.
LambdaTracer is effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix or performed manually with editing software such as Adobe Photoshop.
arXiv Detail & Related papers (2025-02-05T06:24:25Z) - Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing [43.97960454977206]
In this paper, we analyze the diffusion inversion and invariance control based on the flow transformer.
We propose a two-stage inversion to first refine the velocity estimation and then compensate for the leftover error.
This mechanism can simultaneously preserve the non-target contents while allowing rigid and non-rigid manipulation.
arXiv Detail & Related papers (2024-11-24T13:48:16Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models [44.26371926512843]
We introduce a novel free approach that employs progressive $textbfFre$qu$textbfe$ncy truncation to refine the guidance of $textbfDiff$usion models for universal editing tasks.
Our method achieves comparable results with state-of-the-art methods across a variety of editing tasks and on a diverse set of images.
arXiv Detail & Related papers (2024-04-18T04:47:28Z) - Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models [18.75409092764653]
One crucial step in text-driven image editing is to invert the original image into a latent noise code conditioned on the source prompt.
We propose a novel method called Source Prompt Disentangled Inversion (SPDInv), which aims at reducing the impact of source prompt.
The experimental results show that our proposed SPDInv method can effectively mitigate the conflicts between the target editing prompt and the source prompt.
arXiv Detail & Related papers (2024-03-17T06:19:30Z) - Doubly Abductive Counterfactual Inference for Text-based Image Editing [130.46583155383735]
We study text-based image editing (TBIE) of a single image by counterfactual inference.
We propose a Doubly Abductive Counterfactual inference framework (DAC)
Our DAC achieves a good trade-off between editability and fidelity.
arXiv Detail & Related papers (2024-03-05T13:59:21Z) - Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [44.311286151669464]
We present a novel approach called Tuning-free Inversion-enhanced Control (TIC)
TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction.
We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
arXiv Detail & Related papers (2023-12-22T11:13:22Z) - Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes.
InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - SDEdit: Image Synthesis and Editing with Stochastic Differential
Equations [113.35735935347465]
We introduce Differential Editing (SDEdit), based on a recent generative model using differential equations (SDEs)
Given an input image with user edits, we first add noise to the input according to an SDE, and subsequently denoise it by simulating the reverse SDE to gradually increase its likelihood under the prior.
Our method does not require task-specific loss function designs, which are critical components for recent image editing methods based on GAN inversions.
arXiv Detail & Related papers (2021-08-02T17:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.