Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
- URL: http://arxiv.org/abs/2312.14611v1
- Date: Fri, 22 Dec 2023 11:13:22 GMT
- Title: Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
- Authors: Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong
Fei, Mingyuan Fan, Junshi Huang
- Abstract summary: We present a novel approach called Tuning-free Inversion-enhanced Control (TIC)
TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction.
We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
- Score: 44.311286151669464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consistent editing of real images is a challenging task, as it requires
performing non-rigid edits (e.g., changing postures) to the main objects in the
input image without changing their identity or attributes. To guarantee
consistent attributes, some existing methods fine-tune the entire model or the
textual embedding for structural consistency, but they are time-consuming and
fail to perform non-rigid edits. Other works are tuning-free, but their
performances are weakened by the quality of Denoising Diffusion Implicit Model
(DDIM) reconstruction, which often fails in real-world scenarios. In this
paper, we present a novel approach called Tuning-free Inversion-enhanced
Control (TIC), which directly correlates features from the inversion process
with those from the sampling process to mitigate the inconsistency in DDIM
reconstruction. Specifically, our method effectively obtains inversion features
from the key and value features in the self-attention layers, and enhances the
sampling process by these inversion features, thus achieving accurate
reconstruction and content-consistent editing. To extend the applicability of
our method to general editing scenarios, we also propose a mask-guided
attention concatenation strategy that combines contents from both the inversion
and the naive DDIM editing processes. Experiments show that the proposed method
outperforms previous works in reconstruction and consistent editing, and
produces impressive results in various settings.
Related papers
- Latent Inversion with Timestep-aware Sampling for Training-free
Non-rigid Editing [60.65516454338772]
We propose a training-free approach for non-rigid editing with Stable Diffusion.
Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling.
We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality.
arXiv Detail & Related papers (2024-02-13T17:08:35Z) - Noise Map Guidance: Inversion with Spatial Context for Real Image
Editing [23.513950664274997]
Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images.
Their application to editing real images often encounters hurdles due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity.
We present Noise Map Guidance (NMG), an inversion method rich in a spatial context, tailored for real-image editing.
arXiv Detail & Related papers (2024-02-07T07:16:12Z) - BARET : Balanced Attention based Real image Editing driven by
Target-text Inversion [36.59406959595952]
We propose a novel editing technique that only requires an input image and target text for various editing types including non-rigid edits without fine-tuning diffusion model.
Our method contains three novelties: (I) Targettext Inversion Schedule (TTIS) is designed to fine-tune the input target text embedding to achieve fast image reconstruction without image caption and acceleration of convergence; (II) Progressive Transition Scheme applies progressive linear approaches between target text embedding and its fine-tuned version to generate transition embedding for maintaining non-rigid editing capability; (III) Balanced Attention Module (BAM) balances the tradeoff between textual description and image semantics
arXiv Detail & Related papers (2023-12-09T07:18:23Z) - Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes.
InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance.
This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z) - Improving Tuning-Free Real Image Editing with Proximal Guidance [21.070356480624397]
Null-text inversion (NTI) optimize null embeddings to align the reconstruction and inversion trajectories with larger CFG scales.
NPI offers a training-free closed-form solution of NTI, but it may introduce artifacts and is still constrained by DDIM reconstruction quality.
We extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process.
arXiv Detail & Related papers (2023-06-08T17:57:18Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.