The Devil is in Attention Sharing: Improving Complex Non-rigid Image Editing Faithfulness via Attention Synergy
- URL: http://arxiv.org/abs/2512.14423v2
- Date: Wed, 17 Dec 2025 06:53:51 GMT
- Title: The Devil is in Attention Sharing: Improving Complex Non-rigid Image Editing Faithfulness via Attention Synergy
- Authors: Zhuo Chen, Fanyue Wei, Runze Xu, Jingjing Li, Lixin Duan, Angela Yao, Wen Li,
- Abstract summary: We introduce SynPS, a method that Synergistically leverages Positional embeddings and Semantic information for faithful non-rigid image editing.<n>We propose an editing measurement that quantifies the required editing magnitude at each denoising step.<n>By adaptively integrating positional and semantic cues, SynPS effectively avoids both over- and under-editing.
- Score: 71.39358554558667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training-free image editing with large diffusion models has become practical, yet faithfully performing complex non-rigid edits (e.g., pose or shape changes) remains highly challenging. We identify a key underlying cause: attention collapse in existing attention sharing mechanisms, where either positional embeddings or semantic features dominate visual content retrieval, leading to over-editing or under-editing. To address this issue, we introduce SynPS, a method that Synergistically leverages Positional embeddings and Semantic information for faithful non-rigid image editing. We first propose an editing measurement that quantifies the required editing magnitude at each denoising step. Based on this measurement, we design an attention synergy pipeline that dynamically modulates the influence of positional embeddings, enabling SynPS to balance semantic modifications and fidelity preservation. By adaptively integrating positional and semantic cues, SynPS effectively avoids both over- and under-editing. Extensive experiments on public and newly curated benchmarks demonstrate the superior performance and faithfulness of our approach.
Related papers
- ProEdit: Inversion-based Editing From Prompts Done Right [63.554692704101]
Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions.<n>Existing methods typically inject source image information during the sampling process to maintain editing consistency.<n>We propose ProEdit to address this issue both in the attention and the latent aspects.
arXiv Detail & Related papers (2025-12-26T18:59:14Z) - LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing [0.276240219662896]
We introduce LORE, a training-free and efficient image editing method.<n>LORE directly optimize the inverted noise, addressing the core limitations in generalization and controllability of existing approaches.<n> Experimental results show that LORE significantly outperforms strong baselines in terms of semantic alignment, image quality, and background fidelity.
arXiv Detail & Related papers (2025-08-05T06:45:04Z) - Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling [1.8876415010297893]
In image editing, two crucial aspects are editability, which determines the extent of modification, and faithfulness, which reflects how well unaltered elements are preserved.<n>We propose Faithfulness Guidance and Scheduling (FGS), which enhances faithfulness with minimal impact on editability.<n> Experimental results demonstrate that FGS achieves superior faithfulness while maintaining editability.
arXiv Detail & Related papers (2025-06-26T06:46:03Z) - AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing [33.74477787349966]
We propose a novel one-step point-based image editing method, named AttentionDrag.<n>This framework enables semantic consistency and high-quality manipulation without the need for extensive re-optimization or retraining.<n>Our results demonstrate a performance that surpasses most state-of-the-art methods with significantly faster speeds.
arXiv Detail & Related papers (2025-06-16T09:42:38Z) - Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model [60.82962950960996]
We introduce UnifyEdit, a tuning-free method that performs diffusion latent optimization.<n>We develop two attention-based constraints: a self-attention (SA) preservation constraint for structural fidelity, and a cross-attention (CA) alignment constraint to enhance text alignment.<n>Our approach achieves a robust balance between structure preservation and text alignment across various editing tasks, outperforming other state-of-the-art methods.
arXiv Detail & Related papers (2025-04-08T01:02:50Z) - Edicho: Consistent Image Editing in the Wild [90.42395533938915]
Edicho steps in with a training-free solution based on diffusion models.<n>It features a fundamental design principle of using explicit image correspondence to direct editing.
arXiv Detail & Related papers (2024-12-30T16:56:44Z) - Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps.<n>Our method effectively minimizes distortions caused by varying text conditions during noise prediction.<n> Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [44.311286151669464]
We present a novel approach called Tuning-free Inversion-enhanced Control (TIC)
TIC correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction.
We also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes.
arXiv Detail & Related papers (2023-12-22T11:13:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.