Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
- URL: http://arxiv.org/abs/2312.14611v1
- Date: Fri, 22 Dec 2023 11:13:22 GMT
- Title: Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
- Authors: Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong
Fei, Mingyuan Fan, Junshi Huang
- Abstract summary: We present a novel approach called Tuning-free Inversion-enhanced Control (TIC)
TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction.
We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
- Score: 44.311286151669464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consistent editing of real images is a challenging task, as it requires
performing non-rigid edits (e.g., changing postures) to the main objects in the
input image without changing their identity or attributes. To guarantee
consistent attributes, some existing methods fine-tune the entire model or the
textual embedding for structural consistency, but they are time-consuming and
fail to perform non-rigid edits. Other works are tuning-free, but their
performances are weakened by the quality of Denoising Diffusion Implicit Model
(DDIM) reconstruction, which often fails in real-world scenarios. In this
paper, we present a novel approach called Tuning-free Inversion-enhanced
Control (TIC), which directly correlates features from the inversion process
with those from the sampling process to mitigate the inconsistency in DDIM
reconstruction. Specifically, our method effectively obtains inversion features
from the key and value features in the self-attention layers, and enhances the
sampling process by these inversion features, thus achieving accurate
reconstruction and content-consistent editing. To extend the applicability of
our method to general editing scenarios, we also propose a mask-guided
attention concatenation strategy that combines contents from both the inversion
and the naive DDIM editing processes. Experiments show that the proposed method
outperforms previous works in reconstruction and consistent editing, and
produces impressive results in various settings.
Related papers
- Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing [56.536695050042546]
We propose a training-free approach for non-rigid editing with Stable Diffusion.
Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling.
We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality.
arXiv Detail & Related papers (2024-02-13T17:08:35Z) - Noise Map Guidance: Inversion with Spatial Context for Real Image
Editing [23.513950664274997]
Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images.
Their application to editing real images often encounters hurdles due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity.
We present Noise Map Guidance (NMG), an inversion method rich in a spatial context, tailored for real-image editing.
arXiv Detail & Related papers (2024-02-07T07:16:12Z) - Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes.
InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance.
This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z) - Improving Tuning-Free Real Image Editing with Proximal Guidance [21.070356480624397]
Null-text inversion (NTI) optimize null embeddings to align the reconstruction and inversion trajectories with larger CFG scales.
NPI offers a training-free closed-form solution of NTI, but it may introduce artifacts and is still constrained by DDIM reconstruction quality.
We extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process.
arXiv Detail & Related papers (2023-06-08T17:57:18Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.