Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss
- URL: http://arxiv.org/abs/2601.16645v1
- Date: Fri, 23 Jan 2026 11:06:51 GMT
- Title: Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss
- Authors: Minsu Gong, Nuri Ryu, Jungseul Ok, Sunghyun Cho,
- Abstract summary: We propose a novel Structure Preservation Loss (SPL) to quantify structural differences between input and edited images.<n>Our training-free approach integrates SPL directly into the diffusion model's generative process to ensure structural fidelity.<n> Experiments confirm SPL enhances structural fidelity, delivering state-of-the-art performance in latent-diffusion-based image editing.
- Score: 32.26030534230571
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in image editing leverage latent diffusion models (LDMs) for versatile, text-prompt-driven edits across diverse tasks. Yet, maintaining pixel-level edge structures-crucial for tasks such as photorealistic style transfer or image tone adjustment-remains as a challenge for latent-diffusion-based editing. To overcome this limitation, we propose a novel Structure Preservation Loss (SPL) that leverages local linear models to quantify structural differences between input and edited images. Our training-free approach integrates SPL directly into the diffusion model's generative process to ensure structural fidelity. This core mechanism is complemented by a post-processing step to mitigate LDM decoding distortions, a masking strategy for precise edit localization, and a color preservation loss to preserve hues in unedited areas. Experiments confirm SPL enhances structural fidelity, delivering state-of-the-art performance in latent-diffusion-based image editing. Our code will be publicly released at https://github.com/gongms00/SPL.
Related papers
- X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models [3.610796534465868]
Experimental results demonstrate that X-Edit accurately localizes edits in images altered by text-guided diffusion models.<n>This highlights X-Edit's potential as a robust forensic tool for detecting and pinpointing manipulations introduced by advanced image editing techniques.
arXiv Detail & Related papers (2025-05-16T23:29:38Z) - Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing [60.102602955261084]
Implicit Structure Locking (ISLock) is the first training-free editing strategy for AR visual models.<n>Our method preserves structural blueprints by dynamically aligning self-attention patterns with reference images.<n>Our findings pioneer the way for efficient and flexible AR-based image editing, further bridging the performance gap between diffusion and autoregressive generative models.
arXiv Detail & Related papers (2025-04-14T17:25:19Z) - EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM [50.054404519821745]
We present a novel framework that integrates a multimodal Large Language Model for enhanced reasoning capabilities.<n>Our framework achieves promising results on MagicBrush, AutoSplice, and PerfBrush datasets.<n> Notably, our method excels on the PerfBrush dataset, a self-constructed test set featuring previously unseen types of edits.
arXiv Detail & Related papers (2024-12-05T02:05:33Z) - Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps.<n>Our method effectively minimizes distortions caused by varying text conditions during noise prediction.<n> Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z) - Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.<n>Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)<n>We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.<n>Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing [42.73883397041092]
We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism.
In this work, we explore the self-guidance technique to preserve the overall structure of the input image.
We show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing.
arXiv Detail & Related papers (2024-09-02T15:21:46Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - BARET : Balanced Attention based Real image Editing driven by
Target-text Inversion [36.59406959595952]
We propose a novel editing technique that only requires an input image and target text for various editing types including non-rigid edits without fine-tuning diffusion model.
Our method contains three novelties: (I) Targettext Inversion Schedule (TTIS) is designed to fine-tune the input target text embedding to achieve fast image reconstruction without image caption and acceleration of convergence; (II) Progressive Transition Scheme applies progressive linear approaches between target text embedding and its fine-tuned version to generate transition embedding for maintaining non-rigid editing capability; (III) Balanced Attention Module (BAM) balances the tradeoff between textual description and image semantics
arXiv Detail & Related papers (2023-12-09T07:18:23Z) - Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions.
The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result.
To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.