Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data
- URL: http://arxiv.org/abs/2603.03700v1
- Date: Wed, 04 Mar 2026 03:59:02 GMT
- Title: Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data
- Authors: Saptarshi Chakraborty, Quentin Berthet, Peter L. Bartlett,
- Abstract summary: We study the statistical convergence of score-based diffusion models for learning an unknown distribution $$ from finitely many samples.<n>Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data.<n>Our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport.
- Score: 32.72306410557258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution $μ$ from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-$p$ distance. Unlike prior results, our guarantees hold for all $p \ge 1$ and require only a finite-moment assumption on $μ$, without compact-support, manifold, or smooth-density conditions. Specifically, given $n$ i.i.d.\ samples from $μ$ with finite $q$-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-$p$ error between the learned distribution $\hatμ$ and $μ$ scales as $\mathbb{E}\, \mathbb{W}_p(\hatμ,μ) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(μ)}\right),$ where $d^\ast_{p,q}(μ)$ is the $(p,q)$-Wasserstein dimension of $μ$. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(μ)$ rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed $(p,q)$-Wasserstein dimension also extends classical Wasserstein dimension notions to distributions with unbounded support, which may be of independent theoretical interest.
Related papers
- Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models [65.71506381302815]
We propose amortize the cost of sampling from a posterior distribution of the form $p(mathbfxmidmathbfy) propto p_theta(mathbfx)$.<n>For many models and constraints, the posterior in noise space is smoother than in data space, making it more suitable for amortized inference.
arXiv Detail & Related papers (2025-02-10T19:49:54Z) - Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)<n>We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler.
We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance.
Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z) - Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
We study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives.<n>We prove the total variation between the target and the generated data distributions can be bounded above by $mathcalO(d3/4delta1/2)$ in the continuous time level.
arXiv Detail & Related papers (2024-04-15T12:29:28Z) - Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions [11.222970035173372]
kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1 t-fracd+22(tfracd2 vee 1)right)
We show that a kernel-based score estimator achieves an optimal mean square error of $widetildeOleft(n-1/2 t-fracd4right)$ upper bound for the total variation error of the distribution of the sample generated by the diffusion model under a mere sub-Gaussian
arXiv Detail & Related papers (2024-02-23T20:51:31Z) - On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data Dimension [32.98264375121064]
We derive statistical guarantees on the estimated densities in terms of the intrinsic dimension of the data and the latent space.<n>We demonstrate that GANs can effectively achieve the minimax optimal rate even for non-smooth underlying distributions.
arXiv Detail & Related papers (2024-01-28T23:18:10Z) - Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic
Localization [40.808942894229325]
We provide the first convergence bounds which are linear in the data dimension.
We show that diffusion models require at most $tilde O(fracd log2(1/delta)varepsilon2)$ steps to approximate an arbitrary distribution.
arXiv Detail & Related papers (2023-08-07T16:01:14Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.