Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
- URL: http://arxiv.org/abs/2411.15843v2
- Date: Tue, 26 Nov 2024 07:56:41 GMT
- Title: Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
- Authors: Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Charles Ling, Boyu Wang,
- Abstract summary: In this paper, we analyze the diffusion inversion and invariance control based on the flow transformer.
We propose a two-stage inversion to first refine the velocity estimation and then compensate for the leftover error.
This mechanism can simultaneously preserve the non-target contents while allowing rigid and non-rigid manipulation.
- Score: 43.97960454977206
- License:
- Abstract: Leveraging the large generative prior of the flow transformer for tuning-free image editing requires authentic inversion to project the image into the model's domain and a flexible invariance control mechanism to preserve non-target contents. However, the prevailing diffusion inversion performs deficiently in flow-based models, and the invariance control cannot reconcile diverse rigid and non-rigid editing tasks. To address these, we systematically analyze the \textbf{inversion and invariance} control based on the flow transformer. Specifically, we unveil that the Euler inversion shares a similar structure to DDIM yet is more susceptible to the approximation error. Thus, we propose a two-stage inversion to first refine the velocity estimation and then compensate for the leftover error, which pivots closely to the model prior and benefits editing. Meanwhile, we propose the invariance control that manipulates the text features within the adaptive layer normalization, connecting the changes in the text prompt to image semantics. This mechanism can simultaneously preserve the non-target contents while allowing rigid and non-rigid manipulation, enabling a wide range of editing types such as visual text, quantity, facial expression, etc. Experiments on versatile scenarios validate that our framework achieves flexible and accurate editing, unlocking the potential of the flow transformer for versatile image editing.
Related papers
- Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation.
Despite their robust generative capabilities, these models often suffer from inaccurate inversion, which could limit their effectiveness in downstream tasks such as image and video editing.
We propose RF-r, a novel training-free sampler that enhances inversion precision by reducing errors in the process of solving rectified flow ODEs.
arXiv Detail & Related papers (2024-11-07T14:29:02Z) - Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations [41.87051958934507]
This paper addresses two key tasks: (i) inversion and (ii) editing of a real image using rectified flow models (such as Flux)
Our inversion method allows for state-of-the-art performance in zero-shot inversion and editing, outperforming prior works in stroke-to-image synthesis and semantic image editing.
arXiv Detail & Related papers (2024-10-14T17:56:24Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image.
Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image.
We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - Latent Space Editing in Transformer-Based Flow Matching [53.75073756305241]
Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
arXiv Detail & Related papers (2023-12-17T21:49:59Z) - Inversion-Free Image Editing with Natural Language [18.373145158518135]
We present inversion-free editing (InfEdit), which allows for consistent and faithful editing for both rigid and non-rigid semantic changes.
InfEdit shows strong performance in various editing tasks and also maintains a seamless workflow (less than 3 seconds on one single A40), demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2023-12-07T18:58:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.