Denoising Score Matching with Random Features: Insights on Diffusion Models from Precise Learning Curves
- URL: http://arxiv.org/abs/2502.00336v2
- Date: Tue, 07 Oct 2025 19:37:10 GMT
- Title: Denoising Score Matching with Random Features: Insights on Diffusion Models from Precise Learning Curves
- Authors: Anand Jerry George, Rodrigo Veiga, Nicolas Macris,
- Abstract summary: Empirical studies suggest that generalization and memorization are influenced by model complexity and the size of the training dataset.<n>We capture these behaviors by derivingally precise expressions for test and train errors of Denoising Score Matching (DSM) under a simple theoretical setting.<n>Our theoretical findings are consistent with the empirical observations.
- Score: 8.038035670274706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We theoretically investigate the phenomena of generalization and memorization in diffusion models. Empirical studies suggest that these phenomena are influenced by model complexity and the size of the training dataset. In our experiments, we further observe that the number of noise samples per data sample ($m$) used during Denoising Score Matching (DSM) plays a significant and non-trivial role. We capture these behaviors and shed insights into their mechanisms by deriving asymptotically precise expressions for test and train errors of DSM under a simple theoretical setting. The score function is parameterized by random features neural networks, with the target distribution being $d$-dimensional Gaussian. We operate in a regime where the dimension $d$, number of data samples $n$, and number of features $p$ tend to infinity while keeping the ratios $\psi_n=\frac{n}{d}$ and $\psi_p=\frac{p}{d}$ fixed. By characterizing the test and train errors, we identify regimes of generalization and memorization as a function of $\psi_n,\psi_p$, and $m$. Our theoretical findings are consistent with the empirical observations.
Related papers
- Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models [13.312007032203857]
Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling.<n>We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain.<n>By reusing information from previous generations, we get an anytime algorithm that turns additional compute into steadily better samples.
arXiv Detail & Related papers (2025-06-25T17:59:10Z) - Spatial Reasoning with Denoising Models [49.83744014336816]
We introduce a framework to perform reasoning over sets of continuous variables via denoising generative models.<n>For the first time, that order of generation can successfully be predicted by the denoising network itself.<n>Using these findings, we can increase the accuracy of specific reasoning tasks from 1% to >50%.
arXiv Detail & Related papers (2025-02-28T14:08:30Z) - Dimension-free Score Matching and Time Bootstrapping for Diffusion Models [19.62665684173391]
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels.<n>We introduce a martingale-based error decomposition and sharp variance bounds, enabling efficient learning from dependent data.<n>Building on these insights, we propose Bootstrapped Score Matching (BSM), a variance reduction technique that leverages previously learned scores to improve accuracy at higher noise levels.
arXiv Detail & Related papers (2025-02-14T18:32:22Z) - Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.<n>We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Random features and polynomial rules [0.0]
We present a generalization of the performance of random features models for generic supervised learning problems with Gaussian data.
We find good agreement far from the limits where $Dto infty$ and at least one between $P/DK$, $N/DL$ remains finite.
arXiv Detail & Related papers (2024-02-15T18:09:41Z) - Diffusion Random Feature Model [0.0]
We present a diffusion model-inspired deep random feature model that is interpretable.
We derive generalization bounds between the distribution of sampled data and the true distribution using properties of score matching.
We validate our findings by generating samples on the fashion MNIST dataset and instrumental audio data.
arXiv Detail & Related papers (2023-10-06T17:59:05Z) - Kernel-Based Tests for Likelihood-Free Hypothesis Testing [21.143798051525646]
Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to emphone of the two classes.
Special cases of this problem are well-known; when $m=1$ it corresponds to binary classification; and when $mapprox n$ it is equivalent to two-sample testing.
In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed.
arXiv Detail & Related papers (2023-08-17T15:24:03Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Score-based Diffusion Models in Function Space [137.70916238028306]
Diffusion models have recently emerged as a powerful framework for generative modeling.<n>This work introduces a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.<n>We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - Convergence for score-based generative modeling with polynomial
complexity [9.953088581242845]
We prove the first convergence guarantees for the core mechanic behind Score-based generative modeling.
Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality.
We show that a predictor-corrector gives better convergence than using either portion alone.
arXiv Detail & Related papers (2022-06-13T14:57:35Z) - Precise Learning Curves and Higher-Order Scaling Limits for Dot Product
Kernel Regression [41.48538038768993]
We focus on the problem of kernel ridge regression for dot-product kernels.
We observe a peak in the learning curve whenever $m approx dr/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales.
arXiv Detail & Related papers (2022-05-30T04:21:31Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Minimax Estimation of Partially-Observed Vector AutoRegressions [0.0]
We study the properties of a partially-observed state-space model.
We describe a sparse estimator based on the Dantzig selector and upper bound its non-asymptotic error.
An application to open railway data highlights the relevance of this model for public transport traffic analysis.
arXiv Detail & Related papers (2021-06-17T08:46:53Z) - Locality defeats the curse of dimensionality in convolutional
teacher-student scenarios [69.2027612631023]
We show that locality is key in determining the learning curve exponent $beta$.
We conclude by proving, using a natural assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
arXiv Detail & Related papers (2021-06-16T08:27:31Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - More data or more parameters? Investigating the effect of data structure
on generalization [17.249712222764085]
Properties of data impact the test error as a function of the number of training examples and number of training parameters.
We show that noise in the labels and strong anisotropy of the input data play similar roles on the test error.
arXiv Detail & Related papers (2021-03-09T16:08:41Z) - System Identification Through Lipschitz Regularized Deep Neural Networks [0.4297070083645048]
We use neural networks to learn governing equations from data.
We reconstruct the right-hand side of a system of ODEs $dotx(t) = f(t, x(t))$ directly from observed uniformly time-sampled data.
arXiv Detail & Related papers (2020-09-07T17:52:51Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.