A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions
- URL: http://arxiv.org/abs/2508.16306v1
- Date: Fri, 22 Aug 2025 11:29:06 GMT
- Title: A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions
- Authors: Nishant Jain, Tong Zhang,
- Abstract summary: Diffusion-based generative models have emerged as highly effective methods for synthesizing high-quality samples.<n>Recent works have focused on analyzing the convergence of their generation process with minimal assumptions.<n>We present a refined analysis that improves the dependence on $varepsilon$.
- Score: 20.628481624954187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion-based generative models have emerged as highly effective methods for synthesizing high-quality samples. Recent works have focused on analyzing the convergence of their generation process with minimal assumptions, either through reverse SDEs or Probability Flow ODEs. The best known guarantees, without any smoothness assumptions, for the KL divergence so far achieve a linear dependence on the data dimension $d$ and an inverse quadratic dependence on $\varepsilon$. In this work, we present a refined analysis that improves the dependence on $\varepsilon$. We model the generation process as a composition of two steps: a reverse ODE step, followed by a smaller noising step along the forward process. This design leverages the fact that the ODE step enables control in Wasserstein-type error, which can then be converted into a KL divergence bound via noise addition, leading to a better dependence on the discretization step size. We further provide a novel analysis to achieve the linear $d$-dependence for the error due to discretizing this Probability Flow ODE in absence of any smoothness assumptions. We show that $\tilde{O}\left(\tfrac{d\log^{3/2}(\frac{1}{\delta})}{\varepsilon}\right)$ steps suffice to approximate the target distribution corrupted with Gaussian noise of variance $\delta$ within $O(\varepsilon^2)$ in KL divergence, improving upon the previous best result, requiring $\tilde{O}\left(\tfrac{d\log^2(\frac{1}{\delta})}{\varepsilon^2}\right)$ steps.
Related papers
- Fast Convergence for High-Order ODE Solvers in Diffusion Probabilistic Models [5.939858158928473]
Diffusion probabilistic models generate samples by learning to reverse a noise-injection process that transforms data into noise.<n>A key development is the reformulation of the reverse sampling process as a deterministic probability flow ordinary differential equation (ODE)<n>We present a rigorous convergence analysis of deterministic samplers derived from ODEs for general forward processes with arbitrary variance schedules.
arXiv Detail & Related papers (2025-06-16T03:09:25Z) - Multi-Step Consistency Models: Fast Generation with Theoretical Guarantees [15.366598179769918]
We provide a theoretical analysis of consistency models capable of mapping inputs at a given time to arbitrary points along the reverse trajectory.<n>We show that one can achieve a KL divergence of order $ O(varepsilon2) $ using only $ Oleft(logleft(fracdvarepsilonright) $ iterations with a constant step size.<n>We conclude that accurate learning is feasible using small discretization steps, both in smooth and non-smooth settings.
arXiv Detail & Related papers (2025-05-02T06:50:46Z) - A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler.
We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance.
Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z) - Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic
Localization [40.808942894229325]
We provide the first convergence bounds which are linear in the data dimension.
We show that diffusion models require at most $tilde O(fracd log2(1/delta)varepsilon2)$ steps to approximate an arbitrary distribution.
arXiv Detail & Related papers (2023-08-07T16:01:14Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Efficient Sampling of Stochastic Differential Equations with Positive
Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix.
It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$
Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z) - Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic
Analysis For DDIM-Type Samplers [90.45898746733397]
We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling.
We show that one step along the probability flow ODE can be expressed as two steps: 1) a restoration step that runs ascent on the conditional log-likelihood at some infinitesimally previous time, and 2) a degradation step that runs the forward process using noise pointing back towards the current gradient.
arXiv Detail & Related papers (2023-03-06T18:59:19Z) - Improved Analysis of Score-based Generative Modeling: User-Friendly
Bounds under Minimal Smoothness Assumptions [9.953088581242845]
We provide convergence guarantees with complexity for any data distribution with second-order moment.
Our result does not rely on any log-concavity or functional inequality assumption.
Our theoretical analysis provides comparison between different discrete approximations and may guide the choice of discretization points in practice.
arXiv Detail & Related papers (2022-11-03T15:51:00Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.