Related papers: SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing

SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing

URL: http://arxiv.org/abs/2510.25970v1
Date: Wed, 29 Oct 2025 21:12:58 GMT
Title: SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing
Authors: Sung-Hoon Yoon, Minghan Li, Gaspard Beaudouin, Congcong Wen, Muhammad Rafay Azhar, Mengyu Wang,
Abstract summary: Rectified flow models have become a de facto standard in image generation due to their stable sampling trajectories and high-fidelity outputs.<n>Despite their strong generative capabilities, they face critical limitations in image editing tasks.<n>Recent efforts have attempted to directly map source and target distributions via ODE-based approaches without inversion.<n>We propose a flow decomposition-and-aggregation framework built upon an inversion-free formulation to address these limitations.
Score: 15.234877788378563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rectified flow models have become a de facto standard in image generation due to their stable sampling trajectories and high-fidelity outputs. Despite their strong generative capabilities, they face critical limitations in image editing tasks: inaccurate inversion processes for mapping real images back into the latent space, and gradient entanglement issues during editing often result in outputs that do not faithfully reflect the target prompt. Recent efforts have attempted to directly map source and target distributions via ODE-based approaches without inversion; however,these methods still yield suboptimal editing quality. In this work, we propose a flow decomposition-and-aggregation framework built upon an inversion-free formulation to address these limitations. Specifically, we semantically decompose the target prompt into multiple sub-prompts, compute an independent flow for each, and aggregate them to form a unified editing trajectory. While we empirically observe that decomposing the original flow enhances diversity in the target space, generating semantically aligned outputs still requires consistent guidance toward the full target prompt. To this end, we design a projection and soft-aggregation mechanism for flow, inspired by gradient conflict resolution in multi-task learning. This approach adaptively weights the sub-target velocity fields, suppressing semantic redundancy while emphasizing distinct directions, thereby preserving both diversity and consistency in the final edited output. Experimental results demonstrate that our method outperforms existing zero-shot editing approaches in terms of semantic fidelity and attribute disentanglement. The code is available at https://github.com/Harvard-AI-and-Robotics-Lab/SplitFlow.

Related papers

EditInfinity: Image Editing with Binary-Quantized Generative Models [64.05135380710749]
We investigate the parameter-efficient adaptation of VQ-based generative models for image editing.<n>We propose an efficient yet effective image inversion mechanism that integrates text prompting rectification and image style preservation.<n>Experiments on the PIE-Bench benchmark demonstrate the superior performance of our model compared to state-of-the-art diffusion-based baselines.
arXiv Detail & Related papers (2025-10-23T05:06:24Z)
FlowCycle: Pursuing Cycle-Consistent Flows for Text-based Editing [12.424207508842192]
We propose FlowCycle, a novel inversion-free and flow-based editing framework.<n>We show that FlowCycle achieves superior editing quality and consistency over state-of-the-art methods.
arXiv Detail & Related papers (2025-10-23T04:58:29Z)
OSCAR: Orthogonal Stochastic Control for Alignment-Respecting Diversity in Flow Matching [14.664226708184676]
Flow-based text-to-image models follow deterministic trajectories, forcing users to repeatedly sample to discover diverse modes.<n>We present a training-free, inference-time control mechanism that makes the flow itself diversity-aware.
arXiv Detail & Related papers (2025-10-10T07:07:19Z)
TweezeEdit: Consistent and Efficient Image Editing with Path Regularization [6.248205481752008]
We propose TweezeEdit, a tuning- and inversion-free framework for consistent and efficient image editing.<n>Our method addresses these limitations by regularizing the entire denoising path rather than relying solely on the inversion anchors.<n>Experiments demonstrate TweezeEdit's superior performance in semantic preservation and target alignment, outperforming existing methods.
arXiv Detail & Related papers (2025-08-14T09:59:45Z)
Training-free Geometric Image Editing on Diffusion Models [53.38549950608886]
We tackle the task of geometric image editing, where an object within an image is repositioned, reoriented, or reshaped.<n>We propose a decoupled pipeline that separates object transformation, source region inpainting, and target region refinement.<n>Both inpainting and refinement are implemented using a training-free diffusion approach, FreeFine.
arXiv Detail & Related papers (2025-07-31T07:36:00Z)
FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing [47.908940130654535]
FlowAlign is an inversion-free flow-based framework for consistent image editing with optimal control-based trajectory control.<n>Our terminal point regularization is shown to balance semantic alignment with the edit prompt and structural consistency with the source image along the trajectory.<n>FlowAlign outperforms existing methods in both source preservation and editing controllability.
arXiv Detail & Related papers (2025-05-29T06:33:16Z)
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting [54.525583840585305]
We introduce OmniPaint, a unified framework that re-conceptualizes object removal and insertion as interdependent processes.<n>Our novel CFD metric offers a robust, reference-free evaluation of context consistency and object hallucination.
arXiv Detail & Related papers (2025-03-11T17:55:27Z)
Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.<n>Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)<n>We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.<n>Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z)
Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers like FLUX and OpenSora have demonstrated outstanding performance in the field of image and video generation.<n>Despite their robust generative capabilities, these models often struggle with inaccuracies.<n>We propose RF-r, a training-free sampler that effectively enhances inversion precision by mitigating the errors in the inversion process of rectified flow.
arXiv Detail & Related papers (2024-11-07T14:29:02Z)
Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes. InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.