Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation
- URL: http://arxiv.org/abs/2602.00413v1
- Date: Sat, 31 Jan 2026 00:06:55 GMT
- Title: Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation
- Authors: Yidong Ouyang, Liyan Xie, Hongyuan Zha, Guang Cheng,
- Abstract summary: Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation.<n>We propose a novel alignment framework by leveraging the underlying nature of the alignment problem.<n>We achieve comparable performance to finetuning-based models with one-step generation with at least a 60% reduction in computational cost.
- Score: 39.484148941369234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function, these approaches require extensive computational resources and may not generalize well across different objectives. In this work, we propose a novel alignment framework by leveraging the underlying nature of the alignment problem -- sampling from reward-weighted distributions -- and show that it applies to both diffusion models (via score guidance) and flow matching models (via velocity guidance). The score function (velocity field) required for the reward-weighted distribution can be decomposed into the pre-trained score (velocity field) plus a conditional expectation of the reward. For the alignment on the diffusion model, we identify a fundamental challenge: the adversarial nature of the guidance term can introduce undesirable artifacts in the generated images. Therefore, we propose a finetuning-free framework that trains a guidance network to estimate the conditional expectation of the reward. We achieve comparable performance to finetuning-based models with one-step generation with at least a 60% reduction in computational cost. For the alignment on flow matching, we propose a training-free framework that improves the generation quality without additional computational cost.
Related papers
- Composition and Alignment of Diffusion Models using Constrained Learning [79.36736636241564]
Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions.<n>Two commonly used methods are: (i) alignment, which involves fine-tuning a diffusion model to align it with a reward; and (ii) composition, which combines several pre-trained diffusion models, each emphasizing a desirable attribute in the generated outputs.<n>We propose a constrained optimization framework that unifies alignment and composition of diffusion models by enforcing that the aligned model satisfies reward constraints and/or remains close to (potentially multiple) pre-trained models.
arXiv Detail & Related papers (2025-08-26T15:06:30Z) - Aligning Latent Spaces with Flow Priors [72.24305287508474]
This paper presents a novel framework for aligning learnable latent spaces to arbitrary target distributions by leveraging flow-based generative models as priors.<n> Notably, the proposed method eliminates computationally expensive likelihood evaluations and avoids ODE solving during optimization.
arXiv Detail & Related papers (2025-06-05T16:59:53Z) - Flow Matching Posterior Sampling: A Training-free Conditional Generation for Flow Matching [13.634043135217254]
We propose Flow Matching-based Posterior Sampling (FMPS) to expand its application scope.<n>This correction term can be reformulated to incorporate a surrogate score function.<n>We show that FMPS achieves superior generation quality compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-11-12T08:14:39Z) - Flow map matching with stochastic interpolants: A mathematical framework for consistency models [15.520853806024943]
Flow Map Matching is a principled framework for learning the two-time flow map of an underlying generative model.<n>We show that FMM unifies and extends a broad class of existing approaches for fast sampling.
arXiv Detail & Related papers (2024-06-11T17:41:26Z) - Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.