Improved Analysis of Score-based Generative Modeling: User-Friendly
Bounds under Minimal Smoothness Assumptions
- URL: http://arxiv.org/abs/2211.01916v1
- Date: Thu, 3 Nov 2022 15:51:00 GMT
- Title: Improved Analysis of Score-based Generative Modeling: User-Friendly
Bounds under Minimal Smoothness Assumptions
- Authors: Hongrui Chen, Holden Lee, Jianfeng Lu
- Abstract summary: We provide convergence guarantees with complexity for any data distribution with second-order moment.
Our result does not rely on any log-concavity or functional inequality assumption.
Our theoretical analysis provides comparison between different discrete approximations and may guide the choice of discretization points in practice.
- Score: 9.953088581242845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on the theoretical analysis of diffusion-based
generative modeling. Under an $L^2$-accurate score estimator, we provide
convergence guarantees with polynomial complexity for any data distribution
with second-order moment, by either employing an early stopping technique or
assuming smoothness condition on the score function of the data distribution.
Our result does not rely on any log-concavity or functional inequality
assumption and has a logarithmic dependence on the smoothness. In particular,
we show that under only a finite second moment condition, approximating the
following in KL divergence in $\epsilon$-accuracy can be done in $\tilde
O\left(\frac{d^2 \log^2 (1/\delta)}{\epsilon^2}\right)$ steps: 1) the
variance-$\delta$ Gaussian perturbation of any data distribution; 2) data
distributions with $1/\delta$-smooth score functions. Our theoretical analysis
also provides quantitative comparison between different discrete approximations
and may guide the choice of discretization points in practice.
Related papers
- Improved Convergence Rate for Diffusion Probabilistic Models [7.237817437521988]
Score-based diffusion models have achieved remarkable empirical performance in the field of machine learning and artificial intelligence.
Despite a lot of theoretical attempts, there still exists significant gap between theory and practice.
We establish an iteration complexity at the order of $d2/3varepsilon-2/3$, which is better than $d5/12varepsilon-1$.
Our theory accommodates $varepsilon$-accurate score estimates, and does not require log-concavity on the target distribution.
arXiv Detail & Related papers (2024-10-17T16:37:33Z) - A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler.
We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance.
Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z) - Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
We study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives.
We prove the total variation between the target and the generated data distributions can be bounded above by $mathcalO(d3/4delta1/2)$ in the continuous time level.
arXiv Detail & Related papers (2024-04-15T12:29:28Z) - High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise.
We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z) - Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic
Localization [40.808942894229325]
We provide the first convergence bounds which are linear in the data dimension.
We show that diffusion models require at most $tilde O(fracd log2(1/delta)varepsilon2)$ steps to approximate an arbitrary distribution.
arXiv Detail & Related papers (2023-08-07T16:01:14Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.