Related papers: SpotEdit: Selective Region Editing in Diffusion Transformers

SpotEdit: Selective Region Editing in Diffusion Transformers

URL: http://arxiv.org/abs/2512.22323v1
Date: Fri, 26 Dec 2025 14:59:41 GMT
Title: SpotEdit: Selective Region Editing in Diffusion Transformers
Authors: Zhibin Qin, Zhenxiong Tan, Zeqing Wang, Songhua Liu, Xinchao Wang,
Abstract summary: SpotEdit is a training-free diffusion editing framework that selectively updates only the modified regions.<n>By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.
Score: 66.44912649206553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Transformer models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only small regions, while current methods uniformly process and denoise all tokens at every timestep, causing redundant computation and potentially degrading unchanged areas. This raises a fundamental question: Is it truly necessary to regenerate every region during editing? To address this, we propose SpotEdit, a training-free diffusion editing framework that selectively updates only the modified regions. SpotEdit comprises two key components: SpotSelector identifies stable regions via perceptual similarity and skips their computation by reusing conditional image features; SpotFusion adaptively blends these features with edited tokens through a dynamic fusion mechanism, preserving contextual coherence and editing quality. By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.

Related papers

FusionEdit: Semantic Fusion and Attention Modulation for Training-Free Image Editing [7.53296048773288]
Text-guided image editing aims to modify specific regions according to the target prompt while preserving the identity of the source image.<n>Recent methods exploit explicit binary masks to constrain editing, but hard mask boundaries introduce artifacts and reduce editability.<n>We propose FusionEdit, a training-free image editing framework that achieves precise and controllable edits.
arXiv Detail & Related papers (2026-02-09T14:34:18Z)
FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing [52.54102743380658]
We propose FlowDC, which decouples the complex editing into multiple sub-editing effects and superposes them in parallel during the editing process.<n>FlowDC shows superior results compared with existing methods.
arXiv Detail & Related papers (2025-12-12T09:08:39Z)
SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder [52.754326452329956]
We introduce a method for disentangled and continuous editing through token-level manipulation of text embeddings.<n>The edits are applied by manipulating the embeddings along carefully chosen directions, which control the strength of the target attribute.<n>Our method operates directly on text embeddings without modifying the diffusion process, making it model agnostic and broadly applicable to various image backbones.
arXiv Detail & Related papers (2025-10-06T17:51:04Z)
NEP: Autoregressive Image Editing via Next Editing Token Prediction [16.69384738678215]
We propose to formulate image editing as Next Editing-token Prediction (NEP) based on autoregressive image generation.<n>Our model naturally supports test-time scaling (TTS) through iteratively refining its generation in a zero-shot manner.
arXiv Detail & Related papers (2025-08-08T06:06:34Z)
EEdit: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing [47.68813248789496]
We propose a framework, named EEdit, to achieve efficient image editing.<n>Experiments demonstrate an average of 2.46 $times$ acceleration without performance drop in a wide range of editing tasks.
arXiv Detail & Related papers (2025-03-13T11:26:45Z)
LoMOE: Localized Multi-Object Editing via Multi-Diffusion [8.90467024388923]
We introduce a novel framework for zero-shot localized multi-object editing through a multi-diffusion process. Our approach leverages foreground masks and corresponding simple text prompts that exert localized influences on the target regions. A combination of cross-attention and background losses within the latent space ensures that the characteristics of the object being edited are preserved.
arXiv Detail & Related papers (2024-03-01T10:46:47Z)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing. Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z)
ZONE: Zero-Shot Instruction-Guided Local Editing [56.56213730578504]
We propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE. We first convert the editing intent from the user-provided instruction into specific image editing regions through InstructPix2Pix. We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model.
arXiv Detail & Related papers (2023-12-28T02:54:34Z)
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.<n>They either finetune the model, or invert the image in the latent space of the pretrained model.<n>They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.