Region-Aware Diffusion for Zero-shot Text-driven Image Editing
- URL: http://arxiv.org/abs/2302.11797v1
- Date: Thu, 23 Feb 2023 06:20:29 GMT
- Title: Region-Aware Diffusion for Zero-shot Text-driven Image Editing
- Authors: Nisha Huang, Fan Tang, Weiming Dong, Tong-Yee Lee, Changsheng Xu
- Abstract summary: We propose a novel region-aware diffusion model (RDM) for entity-level image editing.
To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline.
The results show that RDM outperforms the previous approaches in terms of visual quality, overall harmonization, non-editing region content preservation, and text-image semantic consistency.
- Score: 78.58917623854079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image manipulation under the guidance of textual descriptions has recently
received a broad range of attention. In this study, we focus on the regional
editing of images with the guidance of given text prompts. Different from
current mask-based image editing methods, we propose a novel region-aware
diffusion model (RDM) for entity-level image editing, which could automatically
locate the region of interest and replace it following given text prompts. To
strike a balance between image fidelity and inference speed, we design the
intensive diffusion pipeline by combing latent space diffusion and enhanced
directional guidance. In addition, to preserve image content in non-edited
regions, we introduce regional-aware entity editing to modify the region of
interest and preserve the out-of-interest region. We validate the proposed RDM
beyond the baseline methods through extensive qualitative and quantitative
experiments. The results show that RDM outperforms the previous approaches in
terms of visual quality, overall harmonization, non-editing region content
preservation, and text-image semantic consistency. The codes are available at
https://github.com/haha-lisa/RDM-Region-Aware-Diffusion-Model.
Related papers
- Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models.
We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z) - LocInv: Localization-aware Inversion for Text-Guided Image Editing [17.611103794346857]
Text-guided image editing research aims to empower users to manipulate generated images by altering the text prompts.
Existing image editing techniques are prone to editing over unintentional regions that are beyond the intended target area.
We propose localization-aware Inversion (LocInv), which exploits segmentation maps or bounding boxes as extra localization priors to refine the cross-attention maps.
arXiv Detail & Related papers (2024-05-02T17:27:04Z) - Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image.
Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image.
We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z) - LIME: Localized Image Editing via Attention Regularization in Diffusion
Models [74.3811832586391]
This paper introduces LIME for localized image editing in diffusion models that do not require user-specified regions of interest (RoI) or additional text input.
Our method employs features from pre-trained methods and a simple clustering technique to obtain precise semantic segmentation maps.
We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
arXiv Detail & Related papers (2023-12-14T18:59:59Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - Blended Diffusion for Text-driven Editing of Natural Images [18.664733153082146]
We introduce the first solution for performing local (region-based) edits in generic natural images.
We achieve our goal by leveraging and combining a pretrained language-image model (CLIP)
To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent.
arXiv Detail & Related papers (2021-11-29T18:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.