Related papers: FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

URL: http://arxiv.org/abs/2509.22244v3
Date: Tue, 30 Sep 2025 02:48:09 GMT
Title: FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing
Authors: Junyi Wu, Zhiteng Li, Haotong Qin, Xiaohong Liu, Linghe Kong, Yulun Zhang, Xiaokang Yang,
Abstract summary: FlashEdit is a novel framework designed to enable high-fidelity, real-time image editing.<n>Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background.
Score: 75.29825659756351
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150$\times$ speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.

Related papers

FusionEdit: Semantic Fusion and Attention Modulation for Training-Free Image Editing [7.53296048773288]
Text-guided image editing aims to modify specific regions according to the target prompt while preserving the identity of the source image.<n>Recent methods exploit explicit binary masks to constrain editing, but hard mask boundaries introduce artifacts and reduce editability.<n>We propose FusionEdit, a training-free image editing framework that achieves precise and controllable edits.
arXiv Detail & Related papers (2026-02-09T14:34:18Z)
RemEdit: Efficient Diffusion Editing with Riemannian Geometry [1.8594036119086927]
RemEdit is a diffusion-based framework for image editing.<n>For editing fidelity, we use a mamba-based module and a goal-aware prompt enrichment pass from a Vision-Language Model.<n>For additional acceleration, we introduce a novel task-specific attention pruning mechanism.<n>RemEdit surpasses prior state-of-the-art editing frameworks while maintaining real-time performance under 50% pruning.
arXiv Detail & Related papers (2026-01-25T17:58:57Z)
SpotEdit: Selective Region Editing in Diffusion Transformers [66.44912649206553]
SpotEdit is a training-free diffusion editing framework that selectively updates only the modified regions.<n>By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.
arXiv Detail & Related papers (2025-12-26T14:59:41Z)
Visual Autoregressive Modeling for Instruction-Guided Image Editing [97.04821896251681]
We present a visual autoregressive framework that reframes image editing as a next-scale prediction problem.<n>VarEdit generates multi-scale target features to achieve precise edits.<n>It completes a $512times512$ editing in 1.2 seconds, making it 2.2$times$ faster than the similarly sized UltraEdit.
arXiv Detail & Related papers (2025-08-21T17:59:32Z)
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model [54.693572837423226]
FireEdit is an innovative Fine-grained Instruction-based image editing framework that exploits a REgion-aware VLM.<n>FireEdit is designed to accurately comprehend user instructions and ensure effective control over the editing process.<n>Our approach surpasses the state-of-the-art instruction-based image editing methods.
arXiv Detail & Related papers (2025-03-25T16:59:42Z)
FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning [34.648413334901164]
We introduce FastEdit, a fast text-guided single-image editing method with semantic-aware diffusion fine-tuning. FastEdit dramatically accelerates the editing process to only 17 seconds. We show promising editing capabilities, including content addition, style transfer, background replacement, and posture manipulation.
arXiv Detail & Related papers (2024-08-06T09:16:13Z)
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z)
Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing [20.01946775715704]
We propose a tuning-free method with only two branches: inversion and editing.<n>This approach allows users to simultaneously edit the object's action and control the generation position of the edited object.<n> Impressive image editing results and quantitative evaluation demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-25T08:00:49Z)
FastDrag: Manipulate Anything in One Step [20.494157877241665]
We introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods.
arXiv Detail & Related papers (2024-05-24T17:59:26Z)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing. Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.