Related papers: Region-Aware Diffusion for Zero-shot Text-driven Image Editing

Region-Aware Diffusion for Zero-shot Text-driven Image Editing

URL: http://arxiv.org/abs/2302.11797v1
Date: Thu, 23 Feb 2023 06:20:29 GMT
Title: Region-Aware Diffusion for Zero-shot Text-driven Image Editing
Authors: Nisha Huang, Fan Tang, Weiming Dong, Tong-Yee Lee, Changsheng Xu
Abstract summary: We propose a novel region-aware diffusion model (RDM) for entity-level image editing. To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline. The results show that RDM outperforms the previous approaches in terms of visual quality, overall harmonization, non-editing region content preservation, and text-image semantic consistency.
Score: 78.58917623854079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image manipulation under the guidance of textual descriptions has recently received a broad range of attention. In this study, we focus on the regional editing of images with the guidance of given text prompts. Different from current mask-based image editing methods, we propose a novel region-aware diffusion model (RDM) for entity-level image editing, which could automatically locate the region of interest and replace it following given text prompts. To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline by combing latent space diffusion and enhanced directional guidance. In addition, to preserve image content in non-edited regions, we introduce regional-aware entity editing to modify the region of interest and preserve the out-of-interest region. We validate the proposed RDM beyond the baseline methods through extensive qualitative and quantitative experiments. The results show that RDM outperforms the previous approaches in terms of visual quality, overall harmonization, non-editing region content preservation, and text-image semantic consistency. The codes are available at https://github.com/haha-lisa/RDM-Region-Aware-Diffusion-Model.

Related papers

X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models [3.610796534465868]
Experimental results demonstrate that X-Edit accurately localizes edits in images altered by text-guided diffusion models.<n>This highlights X-Edit's potential as a robust forensic tool for detecting and pinpointing manipulations introduced by advanced image editing techniques.
arXiv Detail & Related papers (2025-05-16T23:29:38Z)
DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics [71.78350994830885]
We present a novel approach to improving text-guided image editing using diffusion-based models. Our method uses visual and textual self-attention to enhance the cross-attention map, which can serve as a regional cues to improve editing performance. To fully compare our methods with other DiT-based approaches, we construct the RW-800 benchmark, featuring high resolution images, long descriptive texts, real-world images, and a new text editing task.
arXiv Detail & Related papers (2025-03-21T02:14:03Z)
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM [50.054404519821745]
We present a novel framework that integrates a multimodal Large Language Model for enhanced reasoning capabilities. Our framework achieves promising results on MagicBrush, AutoSplice, and PerfBrush datasets. Notably, our method excels on the PerfBrush dataset, a self-constructed test set featuring previously unseen types of edits.
arXiv Detail & Related papers (2024-12-05T02:05:33Z)
Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models. We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z)
LocInv: Localization-aware Inversion for Text-Guided Image Editing [17.611103794346857]
Text-guided image editing research aims to empower users to manipulate generated images by altering the text prompts. Existing image editing techniques are prone to editing over unintentional regions that are beyond the intended target area. We propose localization-aware Inversion (LocInv), which exploits segmentation maps or bounding boxes as extra localization priors to refine the cross-attention maps.
arXiv Detail & Related papers (2024-05-02T17:27:04Z)
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image. Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image. We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z)
LIME: Localized Image Editing via Attention Regularization in Diffusion Models [74.3811832586391]
This paper introduces LIME for localized image editing in diffusion models that do not require user-specified regions of interest (RoI) or additional text input. Our method employs features from pre-trained methods and a simple clustering technique to obtain precise semantic segmentation maps. We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
arXiv Detail & Related papers (2023-12-14T18:59:59Z)
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM) Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z)
Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting. We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z)
DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing. Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
Blended Diffusion for Text-driven Editing of Natural Images [18.664733153082146]
We introduce the first solution for performing local (region-based) edits in generic natural images. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP) To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent.
arXiv Detail & Related papers (2021-11-29T18:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.