Latent Space Editing in Transformer-Based Flow Matching
- URL: http://arxiv.org/abs/2312.10825v1
- Date: Sun, 17 Dec 2023 21:49:59 GMT
- Title: Latent Space Editing in Transformer-Based Flow Matching
- Authors: Vincent Tao Hu, David W Zhang, Pascal Mettes, Meng Tang, Deli Zhao,
Cees G.M. Snoek
- Abstract summary: Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
- Score: 53.75073756305241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper strives for image editing via generative models. Flow Matching is
an emerging generative modeling technique that offers the advantage of simple
and efficient training. Simultaneously, a new transformer-based U-ViT has
recently been proposed to replace the commonly used UNet for better scalability
and performance in generative modeling. Hence, Flow Matching with a transformer
backbone offers the potential for scalable and high-quality generative
modeling, but their latent structure and editing ability are as of yet unknown.
Hence, we adopt this setting and explore how to edit images through latent
space manipulation. We introduce an editing space, which we call $u$-space,
that can be manipulated in a controllable, accumulative, and composable manner.
Additionally, we propose a tailored sampling solution to enable sampling with
the more efficient adaptive step-size ODE solvers. Lastly, we put forth a
straightforward yet powerful method for achieving fine-grained and nuanced
editing using text prompts. Our framework is simple and efficient, all while
being highly effective at editing images while preserving the essence of the
original content. Our code will be publicly available at https://taohu.me/lfm/
Related papers
- Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing [43.97960454977206]
In this paper, we analyze the diffusion inversion and invariance control based on the flow transformer.
We propose a two-stage inversion to first refine the velocity estimation and then compensate for the leftover error.
This mechanism can simultaneously preserve the non-target contents while allowing rigid and non-rigid manipulation.
arXiv Detail & Related papers (2024-11-24T13:48:16Z) - Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - Editable Image Elements for Controllable Synthesis [79.58148778509769]
We propose an image representation that promotes spatial editing of input images using a diffusion model.
We show the effectiveness of our representation on various image editing tasks, such as object resizing, rearrangement, dragging, de-occlusion, removal, variation, and image composition.
arXiv Detail & Related papers (2024-04-24T17:59:11Z) - HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in
Image Editing via Hypernetworks [5.9189325968909365]
We propose an innovative image editing method called HyperEditor, which utilizes weight factors generated by hypernetworks to reassign the weights of the pre-trained StyleGAN2's generator.
Guided by CLIP's cross-modal image-text semantic alignment, this innovative approach enables us to simultaneously accomplish authentic attribute editing and cross-domain style transfer.
arXiv Detail & Related papers (2023-12-21T02:39:53Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing.
We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights.
Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - Style Transformer for Image Inversion and Editing [35.45674653596084]
Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously.
This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN.
The proposed model employs a CNN encoder to provide multi-scale image features as keys and values.
arXiv Detail & Related papers (2022-03-15T14:16:57Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.