Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image
Editing
- URL: http://arxiv.org/abs/2401.09794v1
- Date: Thu, 18 Jan 2024 08:26:37 GMT
- Title: Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image
Editing
- Authors: Gwanhyeong Koo, Sunjae Yoon, Chang D. Yoo
- Abstract summary: We introduce an innovative method that maintains the principles of the Null-text Inversion (NTI) while accelerating the image editing process.
We propose the Wave-Estimator, which determines the text optimization endpoint based on frequency characteristics.
This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method.
- Score: 24.338298020188155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of image editing, Null-text Inversion (NTI) enables fine-grained
editing while preserving the structure of the original image by optimizing null
embeddings during the DDIM sampling process. However, the NTI process is
time-consuming, taking more than two minutes per image. To address this, we
introduce an innovative method that maintains the principles of the NTI while
accelerating the image editing process. We propose the WaveOpt-Estimator, which
determines the text optimization endpoint based on frequency characteristics.
Utilizing wavelet transform analysis to identify the image's frequency
characteristics, we can limit text optimization to specific timesteps during
the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI)
concept, a target prompt representing the original image serves as the initial
text value for optimization. This approach maintains performance comparable to
NTI while reducing the average editing time by over 80% compared to the NTI
method. Our method presents a promising approach for efficient, high-quality
image editing based on diffusion models.
Related papers
- OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - FastDrag: Manipulate Anything in One Step [20.494157877241665]
We introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process.
This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds.
Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods.
arXiv Detail & Related papers (2024-05-24T17:59:26Z) - TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing [12.504661526518234]
We present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing.
We propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization.
Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth.
arXiv Detail & Related papers (2024-04-17T07:08:38Z) - Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization.
Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions.
Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z) - Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image.
Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image.
We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z) - Latent Inversion with Timestep-aware Sampling for Training-free
Non-rigid Editing [60.65516454338772]
We propose a training-free approach for non-rigid editing with Stable Diffusion.
Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling.
We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality.
arXiv Detail & Related papers (2024-02-13T17:08:35Z) - MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image
Translation by Prompts Redescription and Beyond [57.14128305383768]
We propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion)
MirrorDiffusion achieves superior performance over the state-of-the-art methods on zero-shot image translation benchmarks.
arXiv Detail & Related papers (2024-01-06T14:12:16Z) - Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z) - High-Fidelity Guided Image Synthesis with Latent Diffusion Models [50.39294302741698]
The proposed approach outperforms the previous state-of-the-art by over 85.32% on the overall user satisfaction scores.
Human user study results show that the proposed approach outperforms the previous state-of-the-art by over 85.32% on the overall user satisfaction scores.
arXiv Detail & Related papers (2022-11-30T15:43:20Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - Deep Unfolded Recovery of Sub-Nyquist Sampled Ultrasound Image [94.42139459221784]
We propose a reconstruction method from sub-Nyquist samples in the time and spatial domain, that is based on unfolding the ISTA algorithm.
Our method allows reducing the number of array elements, sampling rate, and computational time while ensuring high quality imaging performance.
arXiv Detail & Related papers (2021-03-01T19:19:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.