PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image
Editing
- URL: http://arxiv.org/abs/2306.16894v1
- Date: Wed, 28 Jun 2023 11:10:20 GMT
- Title: PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image
Editing
- Authors: Wenjing Huang, Shikui Tu, Lei Xu
- Abstract summary: PFB-Diff is a Progressive Feature Blending method for Diffusion-based image editing.
Our method demonstrates its superior performance in terms of image fidelity, editing accuracy, efficiency, and faithfulness to the original image.
- Score: 8.19063619210761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have showcased their remarkable capability to synthesize
diverse and high-quality images, sparking interest in their application for
real image editing. However, existing diffusion-based approaches for local
image editing often suffer from undesired artifacts due to the pixel-level
blending of the noised target images and diffusion latent variables, which lack
the necessary semantics for maintaining image consistency. To address these
issues, we propose PFB-Diff, a Progressive Feature Blending method for
Diffusion-based image editing. Unlike previous methods, PFB-Diff seamlessly
integrates text-guided generated content into the target image through
multi-level feature blending. The rich semantics encoded in deep features and
the progressive blending scheme from high to low levels ensure semantic
coherence and high quality in edited images. Additionally, we introduce an
attention masking mechanism in the cross-attention layers to confine the impact
of specific words to desired regions, further improving the performance of
background editing. PFB-Diff can effectively address various editing tasks,
including object/background replacement and object attribute editing. Our
method demonstrates its superior performance in terms of image fidelity,
editing accuracy, efficiency, and faithfulness to the original image, without
the need for fine-tuning or training.
Related papers
- Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models.
We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z) - Streamlining Image Editing with Layered Diffusion Brushes [8.738398948669609]
Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU.
Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation.
arXiv Detail & Related papers (2024-05-01T04:30:03Z) - Towards Understanding Cross and Self-Attention in Stable Diffusion for
Text-Guided Image Editing [47.71851180196975]
tuning-free Text-guided Image Editing (TIE) is of greater importance for application developers.
We conduct an in-depth probing analysis and demonstrate that cross-attention maps in Stable Diffusion often contain object attribution information.
In contrast, self-attention maps play a crucial role in preserving the geometric and shape details of the source image.
arXiv Detail & Related papers (2024-03-06T03:32:56Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - BARET : Balanced Attention based Real image Editing driven by
Target-text Inversion [36.59406959595952]
We propose a novel editing technique that only requires an input image and target text for various editing types including non-rigid edits without fine-tuning diffusion model.
Our method contains three novelties: (I) Targettext Inversion Schedule (TTIS) is designed to fine-tune the input target text embedding to achieve fast image reconstruction without image caption and acceleration of convergence; (II) Progressive Transition Scheme applies progressive linear approaches between target text embedding and its fine-tuned version to generate transition embedding for maintaining non-rigid editing capability; (III) Balanced Attention Module (BAM) balances the tradeoff between textual description and image semantics
arXiv Detail & Related papers (2023-12-09T07:18:23Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method.
We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy.
Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.