PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
- URL: http://arxiv.org/abs/2410.04844v1
- Date: Mon, 7 Oct 2024 09:04:50 GMT
- Title: PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
- Authors: Feng Tian, Yixuan Li, Yichao Yan, Shanyan Guan, Yanhao Ge, Xiaokang Yang,
- Abstract summary: We introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process.
The proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions.
The method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
- Score: 63.38854614997581
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of image editing, three core challenges persist: controllability, background preservation, and efficiency. Inversion-based methods rely on time-consuming optimization to preserve the features of the initial images, which results in low efficiency due to the requirement for extensive network inference. Conversely, inversion-free methods lack theoretical support for background similarity, as they circumvent the issue of maintaining initial features to achieve efficiency. As a consequence, none of these methods can achieve both high efficiency and background consistency. To tackle the challenges and the aforementioned disadvantages, we introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process. Specifically, a corresponding measurement term related to both the initial features and Langevin dynamics is introduced to optimize the estimated image generated by the given target prompt. Extensive experimental results indicate that the proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions. Furthermore, the method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
Related papers
- Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing [42.73883397041092]
We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism.
In this work, we explore the self-guidance technique to preserve the overall structure of the input image.
We show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing.
arXiv Detail & Related papers (2024-09-02T15:21:46Z) - TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization [59.412236435627094]
TALE is a training-free framework harnessing the generative capabilities of text-to-image diffusion models.
We equip TALE with two mechanisms dubbed Adaptive Latent Manipulation and Energy-guided Latent Optimization.
Our experiments demonstrate that TALE surpasses prior baselines and attains state-of-the-art performance in image-guided composition.
arXiv Detail & Related papers (2024-08-07T08:52:21Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing [12.504661526518234]
We present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing.
We propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization.
Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth.
arXiv Detail & Related papers (2024-04-17T07:08:38Z) - Efficient Diffusion Model for Image Restoration by Residual Shifting [63.02725947015132]
This study proposes a novel and efficient diffusion model for image restoration.
Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration.
Our method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks.
arXiv Detail & Related papers (2024-03-12T05:06:07Z) - Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image
Editing [24.338298020188155]
We introduce an innovative method that maintains the principles of the Null-text Inversion (NTI) while accelerating the image editing process.
We propose the Wave-Estimator, which determines the text optimization endpoint based on frequency characteristics.
This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method.
arXiv Detail & Related papers (2024-01-18T08:26:37Z) - Adaptive Image Registration: A Hybrid Approach Integrating Deep Learning
and Optimization Functions for Enhanced Precision [13.242184146186974]
We propose a single framework for image registration based on deep neural networks and optimization.
We show improvements of up to 1.6% in test data, while maintaining the same inference time, and a substantial 1.0% points performance gain in deformation field smoothness.
arXiv Detail & Related papers (2023-11-27T02:48:06Z) - A Simple Baseline for StyleGAN Inversion [133.5868210969111]
StyleGAN inversion plays an essential role in enabling the pretrained StyleGAN to be used for real facial image editing tasks.
Existing optimization-based methods can produce high quality results, but the optimization often takes a long time.
We present a new feed-forward network for StyleGAN inversion, with significant improvement in terms of efficiency and quality.
arXiv Detail & Related papers (2021-04-15T17:59:49Z) - Gated Fusion Network for Degraded Image Super Resolution [78.67168802945069]
We propose a dual-branch convolutional neural network to extract base features and recovered features separately.
By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process.
arXiv Detail & Related papers (2020-03-02T13:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.