Related papers: EEdit: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

EEdit: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

URL: http://arxiv.org/abs/2503.10270v2
Date: Sun, 30 Mar 2025 11:14:17 GMT
Title: EEdit: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
Authors: Zexuan Yan, Yue Ma, Chang Zou, Wenteng Chen, Qifeng Chen, Linfeng Zhang,
Abstract summary: We propose a framework, named EEdit, to achieve efficient image editing.<n>Experiments demonstrate an average of 2.46 $times$ acceleration without performance drop in a wide range of editing tasks.
Score: 48.05033786803384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inversion-based image editing is rapidly gaining momentum while suffering from significant computation overhead, hindering its application in real-time interactive scenarios. In this paper, we rethink that the redundancy in inversion-based image editing exists in both the spatial and temporal dimensions, such as the unnecessary computation in unedited regions and the redundancy in the inversion progress. To tackle these challenges, we propose a practical framework, named EEdit, to achieve efficient image editing. Specifically, we introduce three techniques to solve them one by one. For spatial redundancy, spatial locality caching is introduced to compute the edited region and its neighboring regions while skipping the unedited regions, and token indexing preprocessing is designed to further accelerate the caching. For temporal redundancy, inversion step skipping is proposed to reuse the latent for efficient editing. Our experiments demonstrate an average of 2.46 $\times$ acceleration without performance drop in a wide range of editing tasks including prompt-guided image editing, dragging and image composition. Our codes are available at https://github.com/yuriYanZeXuan/EEdit

Related papers

SpotEdit: Selective Region Editing in Diffusion Transformers [66.44912649206553]
SpotEdit is a training-free diffusion editing framework that selectively updates only the modified regions.<n>By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.
arXiv Detail & Related papers (2025-12-26T14:59:41Z)
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing [28.945176886517448]
RegionE is an adaptive, region-aware generation framework that accelerates IIE tasks without additional training.<n>The framework consists of three main components: 1) Adaptive Region Partition, 2) Region-Aware Generation, and 3) Adaptive Velocity Decay Cache.<n>We applied RegionE to state-of-the-art IIE base models, including Step1X-Edit, FLUX.1 Kontext, and Qwen-Image-Edit.
arXiv Detail & Related papers (2025-10-29T14:58:37Z)
FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing [75.29825659756351]
FlashEdit is a novel framework designed to enable high-fidelity, real-time image editing.<n>Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background.
arXiv Detail & Related papers (2025-09-26T11:59:30Z)
NEP: Autoregressive Image Editing via Next Editing Token Prediction [16.69384738678215]
We propose to formulate image editing as Next Editing-token Prediction (NEP) based on autoregressive image generation.<n>Our model naturally supports test-time scaling (TTS) through iteratively refining its generation in a zero-shot manner.
arXiv Detail & Related papers (2025-08-08T06:06:34Z)
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing [43.082008983889956]
Most diffusion model-based methods use DDIM Inversion as the first stage before editing.<n>We propose a new inversion and sampling method named Dual-Schedule Inversion.<n>We also design a classifier to adaptively combine Dual-Schedule Inversion with different editing methods for user-friendly image editing.
arXiv Detail & Related papers (2024-12-15T11:04:06Z)
PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing [63.38854614997581]
We introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process.<n>The proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions.<n>The method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
arXiv Detail & Related papers (2024-10-07T09:04:50Z)
FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning [34.648413334901164]
We introduce FastEdit, a fast text-guided single-image editing method with semantic-aware diffusion fine-tuning. FastEdit dramatically accelerates the editing process to only 17 seconds. We show promising editing capabilities, including content addition, style transfer, background replacement, and posture manipulation.
arXiv Detail & Related papers (2024-08-06T09:16:13Z)
FastDrag: Manipulate Anything in One Step [20.494157877241665]
We introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods.
arXiv Detail & Related papers (2024-05-24T17:59:26Z)
Noise Map Guidance: Inversion with Spatial Context for Real Image Editing [23.513950664274997]
Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images. Their application to editing real images often encounters hurdles due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity. We present Noise Map Guidance (NMG), an inversion method rich in a spatial context, tailored for real-image editing.
arXiv Detail & Related papers (2024-02-07T07:16:12Z)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing. Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z)
Object-Centric Diffusion for Efficient Video Editing [64.71639719352636]
Diffusion-based video editing has reached impressive quality. Such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames. We propose modifications that allow significant speed-ups whilst maintaining quality.
arXiv Detail & Related papers (2024-01-11T08:36:15Z)
ZONE: Zero-Shot Instruction-Guided Local Editing [56.56213730578504]
We propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE. We first convert the editing intent from the user-provided instruction into specific image editing regions through InstructPix2Pix. We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model.
arXiv Detail & Related papers (2023-12-28T02:54:34Z)
Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference [36.73121523987844]
We introduce Fast Image Semantically Edit (FISEdit), a cached-enabled sparse diffusion model inference engine for efficient text-to-image editing. FISEdit uses semantic mapping between the minor modifications on the input text and the affected regions on the output image. For each text editing step, FISEdit can automatically identify the affected image regions and utilize the cached unchanged regions' feature map to accelerate the inference process.
arXiv Detail & Related papers (2023-05-27T09:14:03Z)
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.<n>They either finetune the model, or invert the image in the latent space of the pretrained model.<n>They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.