Related papers: FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models

FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models

URL: http://arxiv.org/abs/2404.11895v1
Date: Thu, 18 Apr 2024 04:47:28 GMT
Title: FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
Authors: Wei Wu, Qingnan Fan, Shuai Qin, Hong Gu, Ruoyu Zhao, Antoni B. Chan,
Abstract summary: We introduce a novel free approach that employs progressive $textbfFre$qu$textbfe$ncy truncation to refine the guidance of $textbfDiff$usion models for universal editing tasks. Our method achieves comparable results with state-of-the-art methods across a variety of editing tasks and on a diverse set of images.
Score: 44.26371926512843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Precise image editing with text-to-image models has attracted increasing interest due to their remarkable generative capabilities and user-friendly nature. However, such attempts face the pivotal challenge of misalignment between the intended precise editing target regions and the broader area impacted by the guidance in practice. Despite excellent methods leveraging attention mechanisms that have been developed to refine the editing guidance, these approaches necessitate modifications through complex network architecture and are limited to specific editing tasks. In this work, we re-examine the diffusion process and misalignment problem from a frequency perspective, revealing that, due to the power law of natural images and the decaying noise schedule, the denoising network primarily recovers low-frequency image components during the earlier timesteps and thus brings excessive low-frequency signals for editing. Leveraging this insight, we introduce a novel fine-tuning free approach that employs progressive $\textbf{Fre}$qu$\textbf{e}$ncy truncation to refine the guidance of $\textbf{Diff}$usion models for universal editing tasks ($\textbf{FreeDiff}$). Our method achieves comparable results with state-of-the-art methods across a variety of editing tasks and on a diverse set of images, highlighting its potential as a versatile tool in image editing applications.

Related papers

Training-Free Text-Guided Image Editing with Visual Autoregressive Model [46.201510044410995]
We propose a novel text-guided image editing framework based on Visual AutoRegressive modeling. Our method eliminates the need for explicit inversion while ensuring precise and controlled modifications. Our framework operates in a training-free manner and achieves high-fidelity editing with faster inference speeds.
arXiv Detail & Related papers (2025-03-31T09:46:56Z)
Edicho: Consistent Image Editing in the Wild [90.42395533938915]
Edicho steps in with a training-free solution based on diffusion models. It features a fundamental design principle of using explicit image correspondence to direct editing.
arXiv Detail & Related papers (2024-12-30T16:56:44Z)
Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps. Our method effectively minimizes distortions caused by varying text conditions during noise prediction. Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z)
Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation. Despite their robust generative capabilities, these models often suffer from inaccurate inversion, which could limit their effectiveness in downstream tasks such as image and video editing. We propose RF-r, a novel training-free sampler that enhances inversion precision by reducing errors in the process of solving rectified flow ODEs.
arXiv Detail & Related papers (2024-11-07T14:29:02Z)
Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing [42.45138713525929]
Effective editing requires inverting the source image into a latent space, a process often hindered by prediction errors inherent in DDIM inversion. We introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing. Our approach requires no additional retraining and is compatible with various existing editing methods.
arXiv Detail & Related papers (2024-10-24T14:07:02Z)
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing [42.73883397041092]
We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism. In this work, we explore the self-guidance technique to preserve the overall structure of the input image. We show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing.
arXiv Detail & Related papers (2024-09-02T15:21:46Z)
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks. ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z)
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z)
Streamlining Image Editing with Layered Diffusion Brushes [8.738398948669609]
Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation.
arXiv Detail & Related papers (2024-05-01T04:30:03Z)
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image. Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image. We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing. Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z)
LIME: Localized Image Editing via Attention Regularization in Diffusion Models [74.3811832586391]
This paper introduces LIME for localized image editing in diffusion models that do not require user-specified regions of interest (RoI) or additional text input. Our method employs features from pre-trained methods and a simple clustering technique to obtain precise semantic segmentation maps. We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
arXiv Detail & Related papers (2023-12-14T18:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.