Related papers: Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages

Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages

URL: http://arxiv.org/abs/2602.01591v1
Date: Mon, 02 Feb 2026 03:32:00 GMT
Title: Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages
Authors: Zhixiong Yue, Zixuan Ni, Feiyang Ye, Jinshan Zhang, Sheng Shen, Zhenpeng Mi,
Abstract summary: We propose a novel framework for training flow matching text to image models into efficient few step generators well aligned with human preferences.<n>We show that TAFS GRPO achieves strong performance in few step text to image generation and significantly improves the alignment of generated images with human preferences.
Score: 6.470160796651034
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in flow matching models, particularly with reinforcement learning (RL), have significantly enhanced human preference alignment in few step text to image generators. However, existing RL based approaches for flow matching models typically rely on numerous denoising steps, while suffering from sparse and imprecise reward signals that often lead to suboptimal alignment. To address these limitations, we propose Temperature Annealed Few step Sampling with Group Relative Policy Optimization (TAFS GRPO), a novel framework for training flow matching text to image models into efficient few step generators well aligned with human preferences. Our method iteratively injects adaptive temporal noise onto the results of one step samples. By repeatedly annealing the model's sampled outputs, it introduces stochasticity into the sampling process while preserving the semantic integrity of each generated image. Moreover, its step aware advantage integration mechanism combines the GRPO to avoid the need for the differentiable of reward function and provide dense and step specific rewards for stable policy optimization. Extensive experiments demonstrate that TAFS GRPO achieves strong performance in few step text to image generation and significantly improves the alignment of generated images with human preferences. The code and models of this work will be available to facilitate further research.

Related papers

MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation [21.160947261963088]
We present MaskFocus, a novel RL framework that achieves effective policy optimization for masked generative models.<n>Specifically, we determine the step-level information gain by measuring the similarity between the intermediate images at each sampling step and the final generated image.<n>We leverage this to identify the most critical and valuable steps and execute focused policy optimization on them.
arXiv Detail & Related papers (2025-12-21T15:08:31Z)
ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion [18.25085327318649]
We develop a text-to-image (T2I) diffusion model based on backward discretizations, dubbed ProxT2I, relying on learned and conditional proximal operators instead of score functions.<n>We develop a new large-scale and open-source dataset comprising 15 million high-quality human images with fine-grained captions, called LAION-Face-T2I-15M, for training and evaluation.
arXiv Detail & Related papers (2025-11-24T04:10:53Z)
Efficiently Generating Correlated Sample Paths from Multi-step Time Series Foundation Models [66.60042743462175]
We present a copula-based approach to efficiently generate accurate, correlated sample paths from time series foundation models.<n>Our approach generates correlated sample paths orders of magnitude faster than autoregressive sampling.
arXiv Detail & Related papers (2025-10-02T17:08:58Z)
Transport Based Mean Flows for Generative Modeling [19.973366424307077]
Flow-matching generative models have emerged as a powerful paradigm for continuous data generation.<n>These models suffer from slow inference due to the requirement of numerous sequential sampling steps.<n>Recent work has sought to accelerate inference by reducing the number of sampling steps.
arXiv Detail & Related papers (2025-09-26T17:12:19Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization [1.1510009152620668]
Fine-tuning pre-trained generative models with Reinforcement Learning (RL) has emerged as an effective approach for aligning outputs with human preferences.<n>We show that RL-based fine-tuning is both efficient and effective for VAR models, benefiting particularly from their fast inference speeds.
arXiv Detail & Related papers (2025-05-29T10:45:38Z)
Policy Optimized Text-to-Image Pipeline Design [73.9633527029941]
We introduce a novel reinforcement learning-based framework for text-to-image generation.<n>Our approach first trains an ensemble of reward models capable of predicting image quality scores directly from prompt-workflow combinations.<n>We then implement a two-phase training strategy: initial vocabulary training followed by GRPO-based optimization.
arXiv Detail & Related papers (2025-05-27T17:50:47Z)
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization [46.50233461744791]
Preference optimization for diffusion models aims to align them with human preferences for images.<n>We show that pre-trained diffusion models are naturally suited for step-level reward modeling in the noisy latent space.<n>We introduce Latent Preference Optimization (LPO), a step-level preference optimization method conducted directly in the noisy latent space.
arXiv Detail & Related papers (2025-02-03T04:51:28Z)
A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.<n>Our approach enables versatile capabilities via different inference-time sampling schemes.<n>Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
One-Shot Adaptation of GAN in Just One CLIP [51.188396199083336]
We present a novel single-shot GAN adaptation method through unified CLIP space manipulations. Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization. We show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-17T13:03:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.