Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image
Editing
- URL: http://arxiv.org/abs/2401.09794v1
- Date: Thu, 18 Jan 2024 08:26:37 GMT
- Title: Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image
Editing
- Authors: Gwanhyeong Koo, Sunjae Yoon, Chang D. Yoo
- Abstract summary: We introduce an innovative method that maintains the principles of the Null-text Inversion (NTI) while accelerating the image editing process.
We propose the Wave-Estimator, which determines the text optimization endpoint based on frequency characteristics.
This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method.
- Score: 24.338298020188155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of image editing, Null-text Inversion (NTI) enables fine-grained
editing while preserving the structure of the original image by optimizing null
embeddings during the DDIM sampling process. However, the NTI process is
time-consuming, taking more than two minutes per image. To address this, we
introduce an innovative method that maintains the principles of the NTI while
accelerating the image editing process. We propose the WaveOpt-Estimator, which
determines the text optimization endpoint based on frequency characteristics.
Utilizing wavelet transform analysis to identify the image's frequency
characteristics, we can limit text optimization to specific timesteps during
the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI)
concept, a target prompt representing the original image serves as the initial
text value for optimization. This approach maintains performance comparable to
NTI while reducing the average editing time by over 80% compared to the NTI
method. Our method presents a promising approach for efficient, high-quality
image editing based on diffusion models.
Related papers
- PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing [63.38854614997581]
We introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process.
The proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions.
The method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
arXiv Detail & Related papers (2024-10-07T09:04:50Z) - FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting [18.708185548091716]
FRAP is a simple, yet effective approach based on adaptively adjusting the per-token prompt weights to improve prompt-image alignment and authenticity of the generated images.
We show FRAP generates images with significantly higher prompt-image alignment to prompts from complex datasets.
We also explore combining FRAP with prompt rewriting LLM to recover their degraded prompt-image alignment.
arXiv Detail & Related papers (2024-08-21T15:30:35Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - FastDrag: Manipulate Anything in One Step [20.494157877241665]
We introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process.
This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds.
Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods.
arXiv Detail & Related papers (2024-05-24T17:59:26Z) - TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing [12.504661526518234]
We present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing.
We propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization.
Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth.
arXiv Detail & Related papers (2024-04-17T07:08:38Z) - Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization.
Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions.
Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z) - Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing [56.536695050042546]
We propose a training-free approach for non-rigid editing with Stable Diffusion.
Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling.
We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality.
arXiv Detail & Related papers (2024-02-13T17:08:35Z) - MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image
Translation by Prompts Redescription and Beyond [57.14128305383768]
We propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion)
MirrorDiffusion achieves superior performance over the state-of-the-art methods on zero-shot image translation benchmarks.
arXiv Detail & Related papers (2024-01-06T14:12:16Z) - Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - Deep Unfolded Recovery of Sub-Nyquist Sampled Ultrasound Image [94.42139459221784]
We propose a reconstruction method from sub-Nyquist samples in the time and spatial domain, that is based on unfolding the ISTA algorithm.
Our method allows reducing the number of array elements, sampling rate, and computational time while ensuring high quality imaging performance.
arXiv Detail & Related papers (2021-03-01T19:19:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.