Related papers: FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing

FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing

URL: http://arxiv.org/abs/2512.11395v1
Date: Fri, 12 Dec 2025 09:08:39 GMT
Title: FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing
Authors: Yilei Jiang, Zhen Wang, Yanghao Wang, Jun Yu, Yueting Zhuang, Jun Xiao, Long Chen,
Abstract summary: We propose FlowDC, which decouples the complex editing into multiple sub-editing effects and superposes them in parallel during the editing process.<n>FlowDC shows superior results compared with existing methods.
Score: 52.54102743380658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the surge of pre-trained text-to-image flow matching models, text-based image editing performance has gained remarkable improvement, especially for \underline{simple editing} that only contains a single editing target. To satisfy the exploding editing requirements, the \underline{complex editing} which contains multiple editing targets has posed as a more challenging task. However, current complex editing solutions: single-round and multi-round editing are limited by long text following and cumulative inconsistency, respectively. Thus, they struggle to strike a balance between semantic alignment and source consistency. In this paper, we propose \textbf{FlowDC}, which decouples the complex editing into multiple sub-editing effects and superposes them in parallel during the editing process. Meanwhile, we observed that the velocity quantity that is orthogonal to the editing displacement harms the source structure preserving. Thus, we decompose the velocity and decay the orthogonal part for better source consistency. To evaluate the effectiveness of complex editing settings, we construct a complex editing benchmark: Complex-PIE-Bench. On two benchmarks, FlowDC shows superior results compared with existing methods. We also detail the ablations of our module designs.

Related papers

SpotEdit: Selective Region Editing in Diffusion Transformers [66.44912649206553]
SpotEdit is a training-free diffusion editing framework that selectively updates only the modified regions.<n>By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.
arXiv Detail & Related papers (2025-12-26T14:59:41Z)
Group Relative Attention Guidance for Image Editing [38.299491082179905]
Group Relative Attention Guidance (GRAG) is a simple yet effective method that modulates the focus of the model on the input image relative to the editing instruction.<n>Our code will be released at https://www.littlemisfit.com/little-misfit/GRAG-Image-Editing.
arXiv Detail & Related papers (2025-10-28T17:22:44Z)
Image Editing As Programs with Diffusion Models [69.05164729625052]
We introduce Image Editing As Programs (IEAP), a unified image editing framework built upon the Diffusion Transformer (DiT) architecture.<n>IEAP approaches instructional editing through a reductionist lens, decomposing complex editing instructions into sequences of atomic operations.<n>Our framework delivers superior accuracy and semantic fidelity, particularly for complex, multi-step instructions.
arXiv Detail & Related papers (2025-06-04T16:57:24Z)
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform [13.243229817244275]
h-Edit is a training-free method capable of performing simultaneous text-guided and reward-model-based editing.<n>Our experiments show that h-Edit outperforms state-of-the-art baselines in terms of editing effectiveness and faithfulness.
arXiv Detail & Related papers (2025-03-04T01:49:59Z)
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks. ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z)
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing. Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z)
Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types. By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences. We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z)
Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing. We use our search metric to find the optimal inversion step for each editing pair when editing an image. Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z)
Forgedit: Text Guided Image Editing via Learning and Forgetting [17.26772361532044]
We design a novel text-guided image editing method, named as Forgedit. First, we propose a vision-language joint optimization framework capable of reconstructing the original image in 30 seconds. Then, we propose a novel vector projection mechanism in text embedding space of Diffusion Models.
arXiv Detail & Related papers (2023-09-19T12:05:26Z)
InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing [27.661609140918916]
InFusion is a framework for zero-shot text-based video editing. It supports editing of multiple concepts with pixel-level control over diverse concepts mentioned in the editing prompt. Our framework is a low-cost alternative to one-shot tuned models for editing since it does not require training.
arXiv Detail & Related papers (2023-07-22T17:05:47Z)
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.