Related papers: X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models

X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models

URL: http://arxiv.org/abs/2505.11753v1
Date: Fri, 16 May 2025 23:29:38 GMT
Title: X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models
Authors: Valentina Bazyleva, Nicolo Bonettini, Gaurav Bharaj,
Abstract summary: Experimental results demonstrate that X-Edit accurately localizes edits in images altered by text-guided diffusion models.<n>This highlights X-Edit's potential as a robust forensic tool for detecting and pinpointing manipulations introduced by advanced image editing techniques.
Score: 3.610796534465868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-guided diffusion models have significantly advanced image editing, enabling highly realistic and local modifications based on textual prompts. While these developments expand creative possibilities, their malicious use poses substantial challenges for detection of such subtle deepfake edits. To this end, we introduce Explain Edit (X-Edit), a novel method for localizing diffusion-based edits in images. To localize the edits for an image, we invert the image using a pretrained diffusion model, then use these inverted features as input to a segmentation network that explicitly predicts the edited masked regions via channel and spatial attention. Further, we finetune the model using a combined segmentation and relevance loss. The segmentation loss ensures accurate mask prediction by balancing pixel-wise errors and perceptual similarity, while the relevance loss guides the model to focus on low-frequency regions and mitigate high-frequency artifacts, enhancing the localization of subtle edits. To the best of our knowledge, we are the first to address and model the problem of localizing diffusion-based modified regions in images. We additionally contribute a new dataset of paired original and edited images addressing the current lack of resources for this task. Experimental results demonstrate that X-Edit accurately localizes edits in images altered by text-guided diffusion models, outperforming baselines in PSNR and SSIM metrics. This highlights X-Edit's potential as a robust forensic tool for detecting and pinpointing manipulations introduced by advanced image editing techniques.

Related papers

PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models.<n>Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z)
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM [50.054404519821745]
We present a novel framework that integrates a multimodal Large Language Model for enhanced reasoning capabilities.<n>Our framework achieves promising results on MagicBrush, AutoSplice, and PerfBrush datasets.<n> Notably, our method excels on the PerfBrush dataset, a self-constructed test set featuring previously unseen types of edits.
arXiv Detail & Related papers (2024-12-05T02:05:33Z)
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z)
Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models. We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z)
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image. Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image. We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z)
LIME: Localized Image Editing via Attention Regularization in Diffusion Models [69.33072075580483]
This paper introduces LIME for localized image editing in diffusion models.<n>LIME does not require user-specified regions of interest (RoI) or additional text input, but rather employs features from pre-trained methods and a straightforward clustering method to obtain precise editing mask.<n>We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
arXiv Detail & Related papers (2023-12-14T18:59:59Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
Blended Diffusion for Text-driven Editing of Natural Images [18.664733153082146]
We introduce the first solution for performing local (region-based) edits in generic natural images. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP) To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent.
arXiv Detail & Related papers (2021-11-29T18:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.