Related papers: FAIL: Flow Matching Adversarial Imitation Learning for Image Generation

FAIL: Flow Matching Adversarial Imitation Learning for Image Generation

URL: http://arxiv.org/abs/2602.12155v1
Date: Thu, 12 Feb 2026 16:36:33 GMT
Title: FAIL: Flow Matching Adversarial Imitation Learning for Image Generation
Authors: Yeyao Ma, Chen Li, Xiaosong Zhang, Han Hu, Weidi Xie,
Abstract summary: Post-training of flow matching models-aligning the output distribution with a high-quality target-is mathematically equivalent to Imitation learning.<n>We propose Flow Matching Adrial Learning (FAIL), which minimizes policy-expert divergence through adversarial training without explicit rewards or pairwise comparisons.
Score: 52.643484089126844
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Post-training of flow matching models-aligning the output distribution with a high-quality target-is mathematically equivalent to imitation learning. While Supervised Fine-Tuning mimics expert demonstrations effectively, it cannot correct policy drift in unseen states. Preference optimization methods address this but require costly preference pairs or reward modeling. We propose Flow Matching Adversarial Imitation Learning (FAIL), which minimizes policy-expert divergence through adversarial training without explicit rewards or pairwise comparisons. We derive two algorithms: FAIL-PD exploits differentiable ODE solvers for low-variance pathwise gradients, while FAIL-PG provides a black-box alternative for discrete or computationally constrained settings. Fine-tuning FLUX with only 13,000 demonstrations from Nano Banana pro, FAIL achieves competitive performance on prompt following and aesthetic benchmarks. Furthermore, the framework generalizes effectively to discrete image and video generation, and functions as a robust regularizer to mitigate reward hacking in reward-based optimization. Code and data are available at https://github.com/HansPolo113/FAIL.

Related papers

Improving Flow Matching by Aligning Flow Divergence [10.1227026659152]
Conditional flow matching (CFM) stands out as an efficient, simulation-free approach for training flow-based generative models.<n>We introduce a new partial differential equation characterization for the error between the learned and exact probability paths, along with its solution.<n>We show that the total variation gap between the two probability paths is bounded above by a combination of the CFM loss and an associated divergence loss.
arXiv Detail & Related papers (2026-01-31T19:07:54Z)
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback [41.41713036839503]
We introduce Edit-R1, a novel post-training framework for instruction-based image editing based on policy optimization.<n>We employ a Multimodal Large Language Model (MLLM) as a unified, training-free reward model, leveraging its output logits to provide fine-grained feedback.<n>Our framework is model-agnostic, delivering substantial performance gains when applied to diverse base models.
arXiv Detail & Related papers (2025-10-19T15:38:06Z)
MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models [86.07486858219137]
Diffusion models excel at generating images conditioned on text prompts.<n>The resulting images often do not satisfy user-specific criteria measured by scalar rewards such as Aesthetic Scores.<n>Recently, inference-time alignment via noise optimization has emerged as an efficient alternative.<n>We show that this approach suffers from reward hacking, where the model produces images that score highly, yet deviate significantly from the original prompt.
arXiv Detail & Related papers (2025-10-02T00:47:36Z)
Beyond Optimal Transport: Model-Aligned Coupling for Flow Matching [59.97254029720877]
Flow Matching (FM) is an effective framework for training a model to learn a vector field that transports samples from a source distribution to a target distribution.<n>We propose Model- Coupling Coupling (MAC), an effective method that matches training couplings based on geometric distance.<n>Experiments show that MAC significantly improves generation quality and efficiency in few-step settings compared to existing methods.
arXiv Detail & Related papers (2025-05-29T11:10:41Z)
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards [52.90573877727541]
reinforcement learning (RL) has been considered for diffusion model fine-tuning.<n>RL's effectiveness is limited by the challenge of sparse reward.<n>$textB2text-DiffuRL$ is compatible with existing optimization algorithms.
arXiv Detail & Related papers (2025-03-14T09:45:19Z)
Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants.<n>There is no differentially private distributed method for training, non-feedback problems.<n>We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z)
Test-time Alignment of Diffusion Models without Reward Over-optimization [8.981605934618349]
Diffusion models excel in generative tasks, but aligning them with specific objectives remains challenging.<n>We propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution.<n>We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization.
arXiv Detail & Related papers (2025-01-10T09:10:30Z)
Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.<n>Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.<n>Our experiments show that the proposed approach significantly improves the average aesthetics scores text-to-image generation.
arXiv Detail & Related papers (2024-10-08T07:33:49Z)
Deep Implicit Optimization enables Robust Learnable Features for Deformable Image Registration [20.34181966545357]
Existing Deep Learning in Image Registration (DLIR) methods do not explicitly incorporate optimization as a layer in a deep network.<n>We show that our method bridges the gap between statistical learning and optimization by explicitly incorporating optimization as a layer in a deep network.<n>Our framework shows excellent performance on in-domain datasets, and is agnostic to domain shift.
arXiv Detail & Related papers (2024-06-11T15:28:48Z)
Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z)
Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle. We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM) We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.