FuguReport

Posterior Augmented Flow Matching

Authors George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman
Affiliations University of Washington / Georgia Institute of Technology / University of California, Irvine / Allen Institute for AI / Hugging Face
Categories Method / Generative Modeling / Posterior Augmented Flow Matching, Evaluation / Model Evaluation / Comparison using FID metric, Evaluation / Model Scaling / Generalization across model sizes
License CC BY 4.0

Abstract Overview

The paper argues that standard flow matching (FM) provides sparse supervision because each intermediate latent state is paired with only one target trajectory, which can produce high-variance training signals and flow collapse in high-dimensional generation tasks. It introduces Posterior-Augmented Flow Matching (PAFM), which replaces single-target supervision with an expectation over multiple plausible target completions for a given intermediate state and condition. The method factorizes the intractable posterior into a conditional path likelihood and a condition likelihood, and uses self-normalized importance weighting to aggregate candidate targets during training. The authors prove that PAFM is an unbiased estimator of the FM objective while reducing gradient variance, and they evaluate it on class-conditional ImageNet-1K and text-to-image CC12M benchmarks across multiple architectures and model scales.

Novelty

The main novelty is a reformulation of flow matching that supervises each intermediate point with a posterior-weighted mixture of valid targets rather than a single endpoint, derived via a factorization of the intractable posterior into a conditional path likelihood and condition likelihood. The paper also contributes a practical self-normalized importance sampling implementation and demonstrates flexibility in how candidate targets are constructed, including nearest neighbors, random spatial augmentations, and VAE moment resampling.

Results

Across ImageNet-1K and CC12M, PAFM consistently improves over standard FM, with reported gains including ImageNet FID improvements from 27.57 to 24.88 for SiT-B/2 (K=16), from 11.14 to 9.85 for SiT-XL/2 (K=16), and a CC12M improvement from 10.37 to 9.45 for MMDiT. The authors also report approximately 4× lower measured mini-batch gradient variance in an ImageNet study, with only a 6.6% throughput reduction and 0.4% memory increase for the nearest-neighbor variant with K=32.

Key Points

  1. PAFM generalizes flow matching by replacing one-to-one supervision with a posterior-weighted expectation over multiple plausible continuation trajectories for each intermediate latent, using self-normalized importance sampling with weights derived from the conditional path likelihood and condition likelihood.
  2. Theoretical analysis proves that PAFM is an unbiased estimator of the original FM objective and reduces gradient variance by a factor related to the Kish effective sample size, which is empirically confirmed with approximately 4× lower mini-batch gradient variance on ImageNet.
  3. Empirical evaluations on ImageNet-1K and CC12M show consistent FID improvements (up to 3.4 FID50K) across SiT-B/2, SiT-XL/2, and MMDiT models, while alternative candidate-selection strategies (augmentations and VAE resampling) also improve over standard FM with negligible computational overhead.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.