Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation
- URL: http://arxiv.org/abs/2601.22679v1
- Date: Fri, 30 Jan 2026 07:51:55 GMT
- Title: Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation
- Authors: Youngjoong Kim, Duhoe Kim, Woosung Kim, Jaesik Park,
- Abstract summary: We provide a theoretical examination of consistency models by analyzing them from a flow map-based perspective.<n>Building on these insights, we revisit self-distillation as a practical remedy for certain forms of suboptimal convergence.
- Score: 28.716825452690173
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Consistency models have been proposed for fast generative modeling, achieving results competitive with diffusion and flow models. However, these methods exhibit inherent instability and limited reproducibility when training from scratch, motivating subsequent work to explain and stabilize these issues. While these efforts have provided valuable insights, the explanations remain fragmented, and the theoretical relationships remain unclear. In this work, we provide a theoretical examination of consistency models by analyzing them from a flow map-based perspective. This joint analysis clarifies how training stability and convergence behavior can give rise to degenerate solutions. Building on these insights, we revisit self-distillation as a practical remedy for certain forms of suboptimal convergence and reformulate it to avoid excessive gradient norms for stable optimization. We further demonstrate that our strategy extends beyond image generation to diffusion-based policy learning, without reliance on a pretrained diffusion model for initialization, thereby illustrating its broader applicability.
Related papers
- Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss [56.120591983649824]
We present a theoretical analysis of diffusion and autoregressive models with diffusion loss.<n>We show that patch denoising optimization in autoregressive models effectively mitigates condition errors and leads to a stable condition distribution.<n>We introduce a novel condition refinement approach based on Optimal Transport (OT) theory to address condition inconsistency''
arXiv Detail & Related papers (2026-02-02T07:48:04Z) - A PDE Perspective on Generative Diffusion Models [8.328108675535562]
We develop a rigorous partial differential equation (PDE) framework for score-based diffusion processes.<n>We derive sharp $Lp$-stability estimates for the associated score-based Fokker-Planck dynamics.<n>Results yield a theoretical guarantee that, under exact guidance, diffusion trajectories return to the data manifold.
arXiv Detail & Related papers (2025-11-08T09:19:25Z) - Provable Maximum Entropy Manifold Exploration via Diffusion Models [58.89696361871563]
Exploration is critical for solving real-world decision-making problems such as scientific discovery.<n>We introduce a novel framework that casts exploration as entropy over approximate data manifold implicitly defined by a pre-trained diffusion model.<n>We develop an algorithm based on mirror descent that solves the exploration problem as sequential fine-tuning of a pre-trained diffusion model.
arXiv Detail & Related papers (2025-06-18T11:59:15Z) - Solving Inverse Problems with FLAIR [68.87167940623318]
We present FLAIR, a training-free variational framework that leverages flow-based generative models as prior for inverse problems.<n>Results on standard imaging benchmarks demonstrate that FLAIR consistently outperforms existing diffusion- and flow-based methods in terms of reconstruction quality and sample diversity.
arXiv Detail & Related papers (2025-06-03T09:29:47Z) - Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods [11.695512384798299]
Supervised fine-tuning is the dominant approach for adapting foundation models to specialized tasks.<n>In vision models, ensembling a pretrained model with its fine-tuned counterpart has been shown to mitigate this issue.<n>We observe an overadaptation phenomenon: the ensemble model not only retains general knowledge from the foundation model but also outperforms the fine-tuned model even on the fine-tuning domain itself.
arXiv Detail & Related papers (2025-06-02T17:23:16Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - Elucidating Flow Matching ODE Dynamics with Respect to Data Geometries and Denoisers [10.947094609205765]
Flow matching (FM) models extend ODE sampler based diffusion models into a general framework.<n>A rigorous theoretical analysis of FM models is essential for sample quality, stability, and broader applicability.<n>In this paper, we advance the theory of FM models through a comprehensive analysis of sample trajectories.
arXiv Detail & Related papers (2024-12-25T01:17:15Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [31.8260779160424]
We investigate how popular algorithms perform as the learned dynamics model is improved.<n>We propose Reach-Aware Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - Towards a mathematical theory for consistency training in diffusion
models [17.632123036281957]
This paper takes a first step towards establishing theoretical underpinnings for consistency models.
We demonstrate that, in order to generate samples within $varepsilon$ proximity to the target in distribution, it suffices for the number of steps in consistency learning to exceed the order of $d5/2/varepsilon$, with the data dimension.
Our theory offers rigorous insights into the validity and efficacy of consistency models, illuminating their utility in downstream inference tasks.
arXiv Detail & Related papers (2024-02-12T17:07:02Z) - Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation.
They still suffer from slow convergence in training and color shift issues in sampling.
In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z) - On the Equivalence of Consistency-Type Models: Consistency Models,
Consistent Diffusion Models, and Fokker-Planck Regularization [68.13034137660334]
We propose theoretical connections between three recent consistency'' notions designed to enhance diffusion models for distinct objectives.
Our insights offer the potential for a more comprehensive and encompassing framework for consistency-type models.
arXiv Detail & Related papers (2023-06-01T05:57:40Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.