LatentEdit: Adaptive Latent Control for Consistent Semantic Editing
- URL: http://arxiv.org/abs/2509.00541v1
- Date: Sat, 30 Aug 2025 15:47:03 GMT
- Title: LatentEdit: Adaptive Latent Control for Consistent Semantic Editing
- Authors: Siyi Liu, Weiming Chen, Yushun Tang, Zhihai He,
- Abstract summary: LatentEdit is an adaptive latent fusion framework that combines the current latent code with a reference latent code inverted from the source image.<n>Our proposed LatentEdit achieves an optimal balance between fidelity and editability, outperforming the state-of-the-art method even in 8-15 steps.
- Score: 24.414252461549555
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Diffusion-based Image Editing has achieved significant success in recent years. However, it remains challenging to achieve high-quality image editing while maintaining the background similarity without sacrificing speed or memory efficiency. In this work, we introduce LatentEdit, an adaptive latent fusion framework that dynamically combines the current latent code with a reference latent code inverted from the source image. By selectively preserving source features in high-similarity, semantically important regions while generating target content in other regions guided by the target prompt, LatentEdit enables fine-grained, controllable editing. Critically, the method requires no internal model modifications or complex attention mechanisms, offering a lightweight, plug-and-play solution compatible with both UNet-based and DiT-based architectures. Extensive experiments on the PIE-Bench dataset demonstrate that our proposed LatentEdit achieves an optimal balance between fidelity and editability, outperforming the state-of-the-art method even in 8-15 steps. Additionally, its inversion-free variant further halves the number of neural function evaluations and eliminates the need for storing any intermediate variables, substantially enhancing real-time deployment efficiency.
Related papers
- The Devil is in Attention Sharing: Improving Complex Non-rigid Image Editing Faithfulness via Attention Synergy [71.39358554558667]
We introduce SynPS, a method that Synergistically leverages Positional embeddings and Semantic information for faithful non-rigid image editing.<n>We propose an editing measurement that quantifies the required editing magnitude at each denoising step.<n>By adaptively integrating positional and semantic cues, SynPS effectively avoids both over- and under-editing.
arXiv Detail & Related papers (2025-12-16T14:08:00Z) - VALA: Learning Latent Anchors for Training-Free and Temporally Consistent [29.516179213427694]
We propose VALA, a variational alignment module that adaptively selects key frames and compresses their latent features into semantic anchors for consistent video editing.<n>Our method can be fully integrated into training-free text-to-image based video editing models.
arXiv Detail & Related papers (2025-10-27T03:44:11Z) - LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing [0.276240219662896]
We introduce LORE, a training-free and efficient image editing method.<n>LORE directly optimize the inverted noise, addressing the core limitations in generalization and controllability of existing approaches.<n> Experimental results show that LORE significantly outperforms strong baselines in terms of semantic alignment, image quality, and background fidelity.
arXiv Detail & Related papers (2025-08-05T06:45:04Z) - Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models [1.9389881806157316]
In this work, we propose a novel framework that enhances image inversion using consistency models.<n>Our method introduces a cycle-consistency optimization strategy that significantly improves reconstruction accuracy.<n>We achieve state-of-the-art performance across various image editing tasks and datasets.
arXiv Detail & Related papers (2025-06-23T20:34:43Z) - AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing [33.74477787349966]
We propose a novel one-step point-based image editing method, named AttentionDrag.<n>This framework enables semantic consistency and high-quality manipulation without the need for extensive re-optimization or retraining.<n>Our results demonstrate a performance that surpasses most state-of-the-art methods with significantly faster speeds.
arXiv Detail & Related papers (2025-06-16T09:42:38Z) - Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model [60.82962950960996]
We introduce UnifyEdit, a tuning-free method that performs diffusion latent optimization.<n>We develop two attention-based constraints: a self-attention (SA) preservation constraint for structural fidelity, and a cross-attention (CA) alignment constraint to enhance text alignment.<n>Our approach achieves a robust balance between structure preservation and text alignment across various editing tasks, outperforming other state-of-the-art methods.
arXiv Detail & Related papers (2025-04-08T01:02:50Z) - Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.<n>Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)<n>We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.<n>Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing [63.38854614997581]
We introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process.<n>The proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions.<n>The method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
arXiv Detail & Related papers (2024-10-07T09:04:50Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.