Related papers: Improving Tuning-Free Real Image Editing with Proximal Guidance

Improving Tuning-Free Real Image Editing with Proximal Guidance

URL: http://arxiv.org/abs/2306.05414v3
Date: Thu, 6 Jul 2023 01:40:21 GMT
Title: Improving Tuning-Free Real Image Editing with Proximal Guidance
Authors: Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Anastasis Stathopoulos, Xiaoxiao He, Yuxiao Chen, Di Liu, Qilong Zhangli, Jindong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris Metaxas
Abstract summary: Null-text inversion (NTI) optimize null embeddings to align the reconstruction and inversion trajectories with larger CFG scales. NPI offers a training-free closed-form solution of NTI, but it may introduce artifacts and is still constrained by DDIM reconstruction quality. We extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process.
Score: 21.070356480624397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing with cross-attention control. Negative-prompt inversion (NPI) further offers a training-free closed-form solution of NTI. However, it may introduce artifacts and is still constrained by DDIM reconstruction quality. To overcome these limitations, we propose proximal guidance and incorporate it to NPI with cross-attention control. We enhance NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature. Additionally, we extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process. Our method provides an efficient and straightforward approach, effectively addressing real image editing tasks with minimal computational overhead.

Related papers

AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing [33.74477787349966]
We propose a novel one-step point-based image editing method, named AttentionDrag.<n>This framework enables semantic consistency and high-quality manipulation without the need for extensive re-optimization or retraining.<n>Our results demonstrate a performance that surpasses most state-of-the-art methods with significantly faster speeds.
arXiv Detail & Related papers (2025-06-16T09:42:38Z)
DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing [73.12011187146481]
Inversion within Diffusion models aims to recover the latent noise representation for a real or generated image.<n>Most inversion approaches suffer from an intrinsic trade-off between reconstruction accuracy and editing flexibility.<n>We introduce Dual-Conditional Inversion (DCI), a novel framework that jointly conditions on the source prompt and reference image.
arXiv Detail & Related papers (2025-06-03T07:46:44Z)
Lost in Edits? A $λ$-Compass for AIGC Provenance [119.95562081325552]
We propose a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones. LambdaTracer is effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix or performed manually with editing software such as Adobe Photoshop.
arXiv Detail & Related papers (2025-02-05T06:24:25Z)
Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps. Our method effectively minimizes distortions caused by varying text conditions during noise prediction. Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z)
Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers like FLUX and OpenSora have demonstrated outstanding performance in the field of image and video generation. Despite their robust generative capabilities, these models often struggle with inaccuracies. We propose RF-r, a training-free sampler that effectively enhances inversion precision by mitigating the errors in the inversion process of rectified flow.
arXiv Detail & Related papers (2024-11-07T14:29:02Z)
InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models [20.90990477016161]
We introduce Geometry-Inverse-Meet-Pixel-Insert, short for GEO, an exceptionally versatile image editing technique. Our approach seamlessly integrates text prompts and image prompts to yield diverse and precise editing outcomes.
arXiv Detail & Related papers (2024-09-18T06:43:40Z)
SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing [27.81211305463269]
We propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.
arXiv Detail & Related papers (2024-09-16T17:10:50Z)
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks. ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z)
Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [44.311286151669464]
We present a novel approach called Tuning-free Inversion-enhanced Control (TIC) TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction. We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
arXiv Detail & Related papers (2023-12-22T11:13:22Z)
In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model. We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z)
Effective Real Image Editing with Accelerated Iterative Diffusion Inversion [6.335245465042035]
It is still challenging to edit and manipulate natural images with modern generative models. Existing approaches that have tackled the problem of inversion stability often incur in significant trade-offs in computational efficiency. We propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity.
arXiv Detail & Related papers (2023-09-10T01:23:05Z)
ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing [20.39792009151017]
StyleGAN allows for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space. Projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. We propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively.
arXiv Detail & Related papers (2023-01-31T04:38:42Z)
Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and Editability [76.6724135757723]
GAN inversion aims to invert an input image into the latent space of a pre-trained GAN. Despite the recent advances in GAN inversion, there remain challenges to mitigate the tradeoff between distortion and editability. We propose a two-step approach that first inverts the input image into a latent code, called pivot code, and then alters the generator so that the input image can be accurately mapped into the pivot code.
arXiv Detail & Related papers (2022-07-19T16:10:16Z)
Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm. With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images. In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z)
High-Fidelity GAN Inversion for Image Attribute Editing [61.966946442222735]
We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved. With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images. We propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction.
arXiv Detail & Related papers (2021-09-14T11:23:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.