Improving Tuning-Free Real Image Editing with Proximal Guidance
- URL: http://arxiv.org/abs/2306.05414v3
- Date: Thu, 6 Jul 2023 01:40:21 GMT
- Title: Improving Tuning-Free Real Image Editing with Proximal Guidance
- Authors: Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei
Ren, Ruijiang Gao, Anastasis Stathopoulos, Xiaoxiao He, Yuxiao Chen, Di Liu,
Qilong Zhangli, Jindong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris
Metaxas
- Abstract summary: Null-text inversion (NTI) optimize null embeddings to align the reconstruction and inversion trajectories with larger CFG scales.
NPI offers a training-free closed-form solution of NTI, but it may introduce artifacts and is still constrained by DDIM reconstruction quality.
We extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process.
- Score: 21.070356480624397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DDIM inversion has revealed the remarkable potential of real image editing
within diffusion-based methods. However, the accuracy of DDIM reconstruction
degrades as larger classifier-free guidance (CFG) scales being used for
enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align
the reconstruction and inversion trajectories with larger CFG scales, enabling
real image editing with cross-attention control. Negative-prompt inversion
(NPI) further offers a training-free closed-form solution of NTI. However, it
may introduce artifacts and is still constrained by DDIM reconstruction
quality. To overcome these limitations, we propose proximal guidance and
incorporate it to NPI with cross-attention control. We enhance NPI with a
regularization term and reconstruction guidance, which reduces artifacts while
capitalizing on its training-free nature. Additionally, we extend the concepts
to incorporate mutual self-attention control, enabling geometry and layout
alterations in the editing process. Our method provides an efficient and
straightforward approach, effectively addressing real image editing tasks with
minimal computational overhead.
Related papers
- InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models [20.90990477016161]
We introduce Geometry-Inverse-Meet-Pixel-Insert, short for GEO, an exceptionally versatile image editing technique.
Our approach seamlessly integrates text prompts and image prompts to yield diverse and precise editing outcomes.
arXiv Detail & Related papers (2024-09-18T06:43:40Z) - SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing [27.81211305463269]
We propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework.
Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.
arXiv Detail & Related papers (2024-09-16T17:10:50Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [44.311286151669464]
We present a novel approach called Tuning-free Inversion-enhanced Control (TIC)
TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction.
We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
arXiv Detail & Related papers (2023-12-22T11:13:22Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Effective Real Image Editing with Accelerated Iterative Diffusion
Inversion [6.335245465042035]
It is still challenging to edit and manipulate natural images with modern generative models.
Existing approaches that have tackled the problem of inversion stability often incur in significant trade-offs in computational efficiency.
We propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity.
arXiv Detail & Related papers (2023-09-10T01:23:05Z) - ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing [20.39792009151017]
StyleGAN allows for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space.
Projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability.
We propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively.
arXiv Detail & Related papers (2023-01-31T04:38:42Z) - Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and
Editability [76.6724135757723]
GAN inversion aims to invert an input image into the latent space of a pre-trained GAN.
Despite the recent advances in GAN inversion, there remain challenges to mitigate the tradeoff between distortion and editability.
We propose a two-step approach that first inverts the input image into a latent code, called pivot code, and then alters the generator so that the input image can be accurately mapped into the pivot code.
arXiv Detail & Related papers (2022-07-19T16:10:16Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z) - High-Fidelity GAN Inversion for Image Attribute Editing [61.966946442222735]
We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved.
With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images.
We propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction.
arXiv Detail & Related papers (2021-09-14T11:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.