Addressing Attribute Leakages in Diffusion-based Image Editing without Training
- URL: http://arxiv.org/abs/2412.04715v3
- Date: Thu, 12 Dec 2024 04:32:38 GMT
- Title: Addressing Attribute Leakages in Diffusion-based Image Editing without Training
- Authors: Sunung Mun, Jinhwan Nam, Sunghyun Cho, Jungseul Ok,
- Abstract summary: ALE-Edit is a novel framework to minimize attribute leakage with three components.<n>We introduce ALE-Bench, a benchmark for evaluating attribute leakage with new metrics for target-external and target-internal leakage.
- Score: 18.85055192982783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have become a cornerstone in image editing, offering flexibility with language prompts and source images. However, a key challenge is attribute leakage, where unintended modifications occur in non-target regions or within target regions due to attribute interference. Existing methods often suffer from leakage due to naive text embeddings and inadequate handling of End-of-Sequence (EOS) token embeddings. To address this, we propose ALE-Edit (Attribute-leakage-free editing), a novel framework to minimize attribute leakage with three components: (1) Object-Restricted Embeddings (ORE) to localize object-specific attributes in text embeddings, (2) Region-Guided Blending for Cross-Attention Masking (RGB-CAM) to align attention with target regions, and (3) Background Blending (BB) to preserve non-edited regions. Additionally, we introduce ALE-Bench, a benchmark for evaluating attribute leakage with new metrics for target-external and target-internal leakage. Experiments demonstrate that our framework significantly reduces attribute leakage while maintaining high editing quality, providing an efficient and tuning-free solution for multi-object image editing.
Related papers
- CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing [24.68304617869157]
Context-Preserving Adaptive Manipulation (CPAM) is a novel framework for complicated, non-rigid real image editing.<n>We develop a preservation adaptation module that adjusts self-attention mechanisms to preserve and independently control the object and background effectively.<n>We also introduce various mask-guidance strategies to facilitate diverse image manipulation tasks in a simple manner.
arXiv Detail & Related papers (2025-06-23T09:19:38Z) - MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models [10.798205956644317]
We propose a training-free, inference-stage optimization approach that enables precise localized image manipulation in complex multi-object scenes, named MDE-Edit.<n>Extensive experiments demonstrate that MDE-Edit outperforms state-of-the-art methods in editing accuracy and visual quality, offering a robust solution for complex multi-object image manipulation tasks.
arXiv Detail & Related papers (2025-05-08T10:01:14Z) - LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing [6.057289837472806]
Text-guided image editing aims to modify specific regions of an image according to natural language instructions.
Since cross-attention mechanisms focus on semantic relevance, they struggle to maintain the image integrity.
We introduce LOCATEdit, which enhances cross-attention maps through a graph-based approach.
arXiv Detail & Related papers (2025-03-27T14:32:17Z) - Lost in Edits? A $λ$-Compass for AIGC Provenance [119.95562081325552]
We propose a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones.
LambdaTracer is effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix or performed manually with editing software such as Adobe Photoshop.
arXiv Detail & Related papers (2025-02-05T06:24:25Z) - CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing [41.92598830147057]
A novel data utilization strategy is introduced to construct datasets consisting of attribute-text triples from a data-driven perspective.
A Skin Transition Frequency Guidance technique is introduced for the local modeling of contextual causality.
arXiv Detail & Related papers (2024-12-18T07:33:22Z) - DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting [63.01425442236011]
We present DreamMix, a diffusion-based generative model adept at inserting target objects into scenes at user-specified locations.
We propose an Attribute Decoupling Mechanism (ADM) and a Textual Attribute Substitution (TAS) module to improve the diversity and discriminative capability of the text-based attribute guidance.
arXiv Detail & Related papers (2024-11-26T08:44:47Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing [3.852667054327356]
We introduce FlexEdit, a flexible and controllable editing framework for objects.
We iteratively adjust latents at each denoising step using our FlexEdit block.
Our framework employs an adaptive mask, automatically extracted during denoising, to protect the background.
arXiv Detail & Related papers (2024-03-27T14:24:30Z) - LoMOE: Localized Multi-Object Editing via Multi-Diffusion [8.90467024388923]
We introduce a novel framework for zero-shot localized multi-object editing through a multi-diffusion process.
Our approach leverages foreground masks and corresponding simple text prompts that exert localized influences on the target regions.
A combination of cross-attention and background losses within the latent space ensures that the characteristics of the object being edited are preserved.
arXiv Detail & Related papers (2024-03-01T10:46:47Z) - LIME: Localized Image Editing via Attention Regularization in Diffusion Models [69.33072075580483]
This paper introduces LIME for localized image editing in diffusion models.<n>LIME does not require user-specified regions of interest (RoI) or additional text input, but rather employs features from pre-trained methods and a straightforward clustering method to obtain precise editing mask.<n>We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
arXiv Detail & Related papers (2023-12-14T18:59:59Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Regularizing Self-training for Unsupervised Domain Adaptation via
Structural Constraints [14.593782939242121]
We propose to incorporate structural cues from auxiliary modalities, such as depth, to regularise conventional self-training objectives.
Specifically, we introduce a contrastive pixel-level objectness constraint that pulls the pixel representations within a region of an object instance closer.
We show that our regularizer significantly improves top performing self-training methods in various UDA benchmarks for semantic segmentation.
arXiv Detail & Related papers (2023-04-29T00:12:26Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.