Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
- URL: http://arxiv.org/abs/2312.14611v1
- Date: Fri, 22 Dec 2023 11:13:22 GMT
- Title: Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
- Authors: Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong
Fei, Mingyuan Fan, Junshi Huang
- Abstract summary: We present a novel approach called Tuning-free Inversion-enhanced Control (TIC)
TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction.
We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
- Score: 44.311286151669464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consistent editing of real images is a challenging task, as it requires
performing non-rigid edits (e.g., changing postures) to the main objects in the
input image without changing their identity or attributes. To guarantee
consistent attributes, some existing methods fine-tune the entire model or the
textual embedding for structural consistency, but they are time-consuming and
fail to perform non-rigid edits. Other works are tuning-free, but their
performances are weakened by the quality of Denoising Diffusion Implicit Model
(DDIM) reconstruction, which often fails in real-world scenarios. In this
paper, we present a novel approach called Tuning-free Inversion-enhanced
Control (TIC), which directly correlates features from the inversion process
with those from the sampling process to mitigate the inconsistency in DDIM
reconstruction. Specifically, our method effectively obtains inversion features
from the key and value features in the self-attention layers, and enhances the
sampling process by these inversion features, thus achieving accurate
reconstruction and content-consistent editing. To extend the applicability of
our method to general editing scenarios, we also propose a mask-guided
attention concatenation strategy that combines contents from both the inversion
and the naive DDIM editing processes. Experiments show that the proposed method
outperforms previous works in reconstruction and consistent editing, and
produces impressive results in various settings.
Related papers
- Lost in Edits? A $λ$-Compass for AIGC Provenance [119.95562081325552]
We propose a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones.
LambdaTracer is effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix or performed manually with editing software such as Adobe Photoshop.
arXiv Detail & Related papers (2025-02-05T06:24:25Z) - Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps.
Our method effectively minimizes distortions caused by varying text conditions during noise prediction.
Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Noise Map Guidance: Inversion with Spatial Context for Real Image
Editing [23.513950664274997]
Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images.
Their application to editing real images often encounters hurdles due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity.
We present Noise Map Guidance (NMG), an inversion method rich in a spatial context, tailored for real-image editing.
arXiv Detail & Related papers (2024-02-07T07:16:12Z) - Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes.
InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance.
This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.