Related papers: SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

URL: http://arxiv.org/abs/2409.10476v1
Date: Mon, 16 Sep 2024 17:10:50 GMT
Title: SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
Authors: Qi Qian, Haiyang Xu, Ming Yan, Juhua Hu,
Abstract summary: We propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.
Score: 27.81211305463269
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion is not optimized for classifier-free guidance and the accumulated error will result in the undesired performance. While many algorithms are developed to improve the framework of DDIM inversion for editing, in this work, we investigate the approximation error in DDIM inversion and propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Moreover, a better guidance scale (i.e., 0.5) than default settings can be derived theoretically. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.

Related papers

Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers like FLUX and OpenSora have demonstrated outstanding performance in the field of image and video generation. Despite their robust generative capabilities, these models often struggle with inaccuracies. We propose RF-r, a training-free sampler that effectively enhances inversion precision by mitigating the errors in the inversion process of rectified flow.
arXiv Detail & Related papers (2024-11-07T14:29:02Z)
Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images. textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z)
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks. ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z)
Deep Implicit Optimization enables Robust Learnable Features for Deformable Image Registration [20.34181966545357]
Existing Deep Learning in Image Registration (DLIR) methods do not explicitly incorporate optimization as a layer in a deep network. We show that our method bridges the gap between statistical learning and optimization by explicitly incorporating optimization as a layer in a deep network. Our framework shows excellent performance on in-domain datasets, and is agnostic to domain shift.
arXiv Detail & Related papers (2024-06-11T15:28:48Z)
MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond [57.14128305383768]
We propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion) MirrorDiffusion achieves superior performance over the state-of-the-art methods on zero-shot image translation benchmarks.
arXiv Detail & Related papers (2024-01-06T14:12:16Z)
Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [44.311286151669464]
We present a novel approach called Tuning-free Inversion-enhanced Control (TIC) TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction. We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
arXiv Detail & Related papers (2023-12-22T11:13:22Z)
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM) Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z)
Effective Real Image Editing with Accelerated Iterative Diffusion Inversion [6.335245465042035]
It is still challenging to edit and manipulate natural images with modern generative models. Existing approaches that have tackled the problem of inversion stability often incur in significant trade-offs in computational efficiency. We propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity.
arXiv Detail & Related papers (2023-09-10T01:23:05Z)
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z)
Improving Tuning-Free Real Image Editing with Proximal Guidance [21.070356480624397]
Null-text inversion (NTI) optimize null embeddings to align the reconstruction and inversion trajectories with larger CFG scales. NPI offers a training-free closed-form solution of NTI, but it may introduce artifacts and is still constrained by DDIM reconstruction quality. We extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process.
arXiv Detail & Related papers (2023-06-08T17:57:18Z)
End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method. It enables plug-and-play guidance by optimizing diffusion latents. It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z)
EDICT: Exact Diffusion Inversion via Coupled Transformations [13.996171129586731]
Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem. We propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers. EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors.
arXiv Detail & Related papers (2022-11-22T18:02:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.