COT Flow: Learning Optimal-Transport Image Sampling and Editing by Contrastive Pairs
- URL: http://arxiv.org/abs/2406.12140v1
- Date: Mon, 17 Jun 2024 23:02:20 GMT
- Title: COT Flow: Learning Optimal-Transport Image Sampling and Editing by Contrastive Pairs
- Authors: Xinrui Zu, Qian Tao,
- Abstract summary: Contrastive Optimal Transport Flow (COT Flow) is a new method that achieves fast and high-quality generation with improved zero-shot editing flexibility.
In terms of quality, COT Flow can generate competitive results in merely one step compared to previous state-of-the-art unpaired image-to-image (I2I) translation methods.
COT Flow can generate competitive results in merely one step compared to previous state-of-the-art unpaired image-to-image (I2I) translation methods.
- Score: 7.542892664684078
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion models have demonstrated strong performance in sampling and editing multi-modal data with high generation quality, yet they suffer from the iterative generation process which is computationally expensive and slow. In addition, most methods are constrained to generate data from Gaussian noise, which limits their sampling and editing flexibility. To overcome both disadvantages, we present Contrastive Optimal Transport Flow (COT Flow), a new method that achieves fast and high-quality generation with improved zero-shot editing flexibility compared to previous diffusion models. Benefiting from optimal transport (OT), our method has no limitation on the prior distribution, enabling unpaired image-to-image (I2I) translation and doubling the editable space (at both the start and end of the trajectory) compared to other zero-shot editing methods. In terms of quality, COT Flow can generate competitive results in merely one step compared to previous state-of-the-art unpaired image-to-image (I2I) translation methods. To highlight the advantages of COT Flow through the introduction of OT, we introduce the COT Editor to perform user-guided editing with excellent flexibility and quality. The code will be released at https://github.com/zuxinrui/cot_flow.
Related papers
- In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer [32.45070206621554]
We present an in-context editing framework for zero-shot instruction compliance using in-context prompting.
We also present a LoRA-MoE hybrid tuning strategy that enhances flexibility with efficient adaptation and dynamic expert routing.
This work establishes a new paradigm that enables high-precision yet efficient instruction-guided editing.
arXiv Detail & Related papers (2025-04-29T12:14:47Z) - Accelerate High-Quality Diffusion Models with Inner Loop Feedback [50.00066451431194]
Inner Loop Feedback (ILF) is a novel approach to accelerate diffusion models' inference.
ILF trains a lightweight module to predict future features in the denoising process.
ILF achieves strong performance for both class-to-image generation with diffusion transformer (DiT) and text-to-image generation with DiT-based PixArt-alpha and PixArt-sigma.
arXiv Detail & Related papers (2025-01-22T18:59:58Z) - FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models [20.46531356084352]
editing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map.
Here, we introduce FlowEdit, a text-based editing method for pre-trained T2I flow models, which is inversion-free, optimization-free and model agnostic.
arXiv Detail & Related papers (2024-12-11T18:50:29Z) - Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers, such as FLUX and OpenSora, have demonstrated exceptional performance in the field of image and video generation.
Despite their robust generative capabilities, these models often suffer from inaccurate inversion, which could limit their effectiveness in downstream tasks such as image and video editing.
We propose RF-r, a novel training-free sampler that enhances inversion precision by reducing errors in the process of solving rectified flow ODEs.
arXiv Detail & Related papers (2024-11-07T14:29:02Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints.
During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image.
Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing [57.76170824395532]
Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video.
We propose COrrespondence-guided Video Editing (COVE) to achieve high-quality and consistent video editing.
COVE can be seamlessly integrated into the pre-trained T2I diffusion model without the need for extra training or optimization.
arXiv Detail & Related papers (2024-06-13T06:27:13Z) - FlowIE: Efficient Image Enhancement via Rectified Flow [71.6345505427213]
FlowIE is a flow-based framework that estimates straight-line paths from an elementary distribution to high-quality images.
Our contributions are rigorously validated through comprehensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-06-01T17:29:29Z) - Improving the Training of Rectified Flows [14.652876697052156]
Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE.
One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error.
We propose improved techniques for training rectified flows, allowing them to compete with emphknowledge distillation methods even in the low NFE setting.
Our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two
arXiv Detail & Related papers (2024-05-30T17:56:04Z) - DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation [43.61383132919089]
Controllable music generation methods are critical for human-centered AI-based music creation.
We propose Distilled Diffusion Inference-Time T -Optimization (or DITTO-2), a new method to speed up inference-time optimization-based control.
arXiv Detail & Related papers (2024-05-30T17:40:11Z) - TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing [12.504661526518234]
We present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing.
We propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization.
Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth.
arXiv Detail & Related papers (2024-04-17T07:08:38Z) - D-Flow: Differentiating through Flows for Controlled Generation [37.80603174399585]
We introduce D-Flow, a framework for controlling the generation process by differentiating through the flow.
We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold.
We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.
arXiv Detail & Related papers (2024-02-21T18:56:03Z) - Latent Space Editing in Transformer-Based Flow Matching [53.75073756305241]
Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
arXiv Detail & Related papers (2023-12-17T21:49:59Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.