Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
- URL: http://arxiv.org/abs/2412.16906v2
- Date: Tue, 25 Mar 2025 03:47:02 GMT
- Title: Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
- Authors: Quan Dao, Hao Phung, Trung Dao, Dimitris Metaxas, Anh Tran,
- Abstract summary: Flow matching has emerged as a promising framework for training generative models.<n>We introduce a self-corrected flow distillation method that integrates consistency models and adversarial training.<n>This work is a pioneer in achieving consistent generation quality in both few-step and one-step sampling.
- Score: 3.8959351616076745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Flow matching has emerged as a promising framework for training generative models, demonstrating impressive empirical performance while offering relative ease of training compared to diffusion-based models. However, this method still requires numerous function evaluations in the sampling process. To address these limitations, we introduce a self-corrected flow distillation method that effectively integrates consistency models and adversarial training within the flow-matching framework. This work is a pioneer in achieving consistent generation quality in both few-step and one-step sampling. Our extensive experiments validate the effectiveness of our method, yielding superior results both quantitatively and qualitatively on CelebA-HQ and zero-shot benchmarks on the COCO dataset. Our implementation is released at https://github.com/VinAIResearch/SCFlow
Related papers
- Align Your Flow: Scaling Continuous-Time Flow Map Distillation [63.927438959502226]
Flow maps connect any two noise levels in a single step and remain effective across all step counts.<n>We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks.<n>We show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.
arXiv Detail & Related papers (2025-06-17T15:06:07Z) - Contrastive Flow Matching [61.60002028726023]
We introduce Contrastive Flow Matching, an extension to the flow matching objective that explicitly enforces uniqueness across all conditional flows.<n>Our approach adds a contrastive objective that maximizes dissimilarities between predicted flows from arbitrary sample pairs.<n>We find that training models with Contrastive Flow Matching (1) improves training speed by a factor of up to 9x, (2) requires up to 5x fewer de-noising steps and (3) lowers FID by up to 8.9 compared to training the same models with flow matching.
arXiv Detail & Related papers (2025-06-05T17:59:58Z) - Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.
We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z) - Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion [9.8078769718432]
We propose an efficient quantization framework for Stable Diffusion models.<n>Our approach features a Serial-to-Parallel calibration pipeline that addresses the consistency of both the calibration and inference processes.<n>Under W4A8 quantization settings, our approach enhances both distribution similarity and visual similarity by 45%-60%.
arXiv Detail & Related papers (2024-12-09T17:00:20Z) - Unraveling the Connections between Flow Matching and Diffusion Probabilistic Models in Training-free Conditional Generation [7.3604864243987365]
We propose Flow Matching-based Posterior Sampling (FMPS) to expand its application scope.<n>This correction term can be reformulated to incorporate a surrogate score function.<n>We show that FMPS achieves superior generation quality compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-11-12T08:14:39Z) - FlowTS: Time Series Generation via Rectified Flow [67.41208519939626]
FlowTS is an ODE-based model that leverages rectified flow with straight-line transport in probability space.
For unconditional setting, FlowTS achieves state-of-the-art performance, with context FID scores of 0.019 and 0.011 on Stock and ETTh datasets.
For conditional setting, we have achieved superior performance in solar forecasting.
arXiv Detail & Related papers (2024-11-12T03:03:23Z) - Consistency Flow Matching: Defining Straight Flows with Velocity Consistency [97.28511135503176]
We introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field.
Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models.
arXiv Detail & Related papers (2024-07-02T16:15:37Z) - Improving Consistency Models with Generator-Induced Flows [16.049476783301724]
Consistency models imitate the multi-step sampling of score-based diffusion in a single forward pass of a neural network.
They can be learned in two ways: consistency distillation and consistency training.
We propose a novel flow that transports noisy data towards their corresponding outputs derived from the currently trained model.
arXiv Detail & Related papers (2024-06-13T20:22:38Z) - Flow map matching with stochastic interpolants: A mathematical framework for consistency models [15.520853806024943]
Flow Map Matching is a principled framework for learning the two-time flow map of an underlying generative model.<n>We show that FMM unifies and extends a broad class of existing approaches for fast sampling.
arXiv Detail & Related papers (2024-06-11T17:41:26Z) - Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation [62.30570286073223]
Diffusion-based text-to-image generation models have demonstrated the ability to produce images aligned with textual descriptions.
We introduce a data-free guided distillation method that enables the efficient distillation of pretrained Diffusion models without access to the real training data.
By exclusively training with synthetic images generated by its one-step generator, our data-free distillation method rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score.
arXiv Detail & Related papers (2024-06-03T17:44:11Z) - Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows [53.31856123113228]
This paper proposes Language Rectified Flow (ours)
Our method is based on the reformulation of the standard probabilistic flow models.
Experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.
arXiv Detail & Related papers (2024-03-25T17:58:22Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Flow Matching in Latent Space [2.9330609943398525]
Flow matching is a framework to train generative models that exhibits impressive empirical performance.
We propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency.
Our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks.
arXiv Detail & Related papers (2023-07-17T17:57:56Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Enhancing Text Generation with Cooperative Training [23.971227375706327]
Most prevailing methods trained generative and discriminative models in isolation, which left them unable to adapt to changes in each other.
We introduce a textitself-consistent learning framework in the text field that involves training a discriminator and generator cooperatively in a closed-loop manner.
Our framework are able to mitigate training instabilities such as mode collapse and non-convergence.
arXiv Detail & Related papers (2023-03-16T04:21:19Z) - Modeling Score Distributions and Continuous Covariates: A Bayesian
Approach [8.772459063453285]
We develop a generative model of the match and non-match score distributions over continuous covariates.
We use mixture models to capture arbitrary distributions and local basis functions.
Three experiments demonstrate the accuracy and effectiveness of our approach.
arXiv Detail & Related papers (2020-09-21T02:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.