Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints
- URL: http://arxiv.org/abs/2512.21637v1
- Date: Thu, 25 Dec 2025 11:38:10 GMT
- Title: Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints
- Authors: Mutiara Shabrina, Nova Kurnia Putri, Jefri Satria Ferdiansyah, Sabita Khansa Dewi, Novanto Yudistira,
- Abstract summary: Text-driven image manipulation often suffers from attribute entanglement, where modifying a target attribute unintentionally alters other semantic properties such as identity or appearance.<n>The Predict, Prevent, and Evaluate framework addresses this issue by leveraging pre-trained vision-language models for disentangled editing.<n> Experimental results demonstrate that the proposed approach enforces more focused and controlled edits, effectively reducing unintended changes in non-target attributes while preserving facial identity.
- Score: 2.4140502941897544
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Text-driven image manipulation often suffers from attribute entanglement, where modifying a target attribute (e.g., adding bangs) unintentionally alters other semantic properties such as identity or appearance. The Predict, Prevent, and Evaluate (PPE) framework addresses this issue by leveraging pre-trained vision-language models for disentangled editing. In this work, we analyze the PPE framework, focusing on its architectural components, including BERT-based attribute prediction and StyleGAN2-based image generation on the CelebA-HQ dataset. Through empirical analysis, we identify a limitation in the original regularization strategy, where latent updates remain dense and prone to semantic leakage. To mitigate this issue, we introduce a sparsity-based constraint using L1 regularization on latent space manipulation. Experimental results demonstrate that the proposed approach enforces more focused and controlled edits, effectively reducing unintended changes in non-target attributes while preserving facial identity.
Related papers
- ASemConsist: Adaptive Semantic Feature Control for Training-Free Identity-Consistent Generation [14.341691123354195]
ASemconsist enables explicit semantic control over character identity without sacrificing prompt alignment.<n>Our framework achieves state-of-the-art performance, effectively overcoming prior trade-offs.
arXiv Detail & Related papers (2025-12-29T07:06:57Z) - LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing [0.276240219662896]
We introduce LORE, a training-free and efficient image editing method.<n>LORE directly optimize the inverted noise, addressing the core limitations in generalization and controllability of existing approaches.<n> Experimental results show that LORE significantly outperforms strong baselines in terms of semantic alignment, image quality, and background fidelity.
arXiv Detail & Related papers (2025-08-05T06:45:04Z) - Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model [60.82962950960996]
We introduce UnifyEdit, a tuning-free method that performs diffusion latent optimization.<n>We develop two attention-based constraints: a self-attention (SA) preservation constraint for structural fidelity, and a cross-attention (CA) alignment constraint to enhance text alignment.<n>Our approach achieves a robust balance between structure preservation and text alignment across various editing tasks, outperforming other state-of-the-art methods.
arXiv Detail & Related papers (2025-04-08T01:02:50Z) - Addressing Text Embedding Leakage in Diffusion-based Image Editing [33.1686050396517]
We introduce Attribute-Leakage-free Editing (ALE), a framework that tackles attribute leakage at its source.<n>ALE combines Object-Restricted Embeddings (ORE) to disentangle text embeddings, Region-Guided Blending for Cross-Attention Masking (RGB-CAM) for spatially precise attention, and Background Blending (BB) to preserve non-edited content.
arXiv Detail & Related papers (2024-12-06T02:10:07Z) - Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing [26.086806549826058]
Text-guided image editing seeks the preservation of core elements in the source image while implementing modifications based on the target text.<n>Existing metrics have a context-blindness problem, indiscriminately applying the same evaluation criteria on completely different pairs of source image and target text.<n>We propose AugCLIP, a context-aware metric that adaptively coordinates preservation and modification aspects.
arXiv Detail & Related papers (2024-10-15T08:12:54Z) - Instilling Multi-round Thinking to Text-guided Image Generation [72.2032630115201]
Single-round generation often overlooks crucial details, particularly in the realm of fine-grained changes like shoes or sleeves.
We introduce a new self-supervised regularization, ie, multi-round regularization, which is compatible with existing methods.
It builds upon the observation that the modification order generally should not affect the final result.
arXiv Detail & Related papers (2024-01-16T16:19:58Z) - AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for
Text-Based Continuity-Sensitive Image Editing [24.9487669818162]
We propose atemporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing.
Our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning extra data, or optimization.
We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.
arXiv Detail & Related papers (2023-12-13T09:45:58Z) - Text Attribute Control via Closed-Loop Disentanglement [72.2786244367634]
We propose a novel approach to achieve a robust control of attributes while enhancing content preservation.
In this paper, we use a semi-supervised contrastive learning method to encourage the disentanglement of attributes in latent spaces.
We conducted experiments on three text datasets, including the Yelp Service review dataset, the Amazon Product review dataset, and the GoEmotions dataset.
arXiv Detail & Related papers (2023-12-01T01:26:38Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - ManiCLIP: Multi-Attribute Face Manipulation from Text [104.30600573306991]
We present a novel multi-attribute face manipulation method based on textual descriptions.
Our method generates natural manipulated faces with minimal text-irrelevant attribute editing.
arXiv Detail & Related papers (2022-10-02T07:22:55Z) - Towards Counterfactual Image Manipulation via CLIP [106.94502632502194]
Existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images.
We investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP)
We design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives.
arXiv Detail & Related papers (2022-07-06T17:02:25Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.