Related papers: Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study

Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study

URL: http://arxiv.org/abs/2505.22841v1
Date: Wed, 28 May 2025 20:22:18 GMT
Title: Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study
Authors: Franck Gabriel, François Ged, Maria Han Veiga, Emmanuel Schertzer,
Abstract summary: Diffusion models can be prone to memorization.<n>Regularization on the score has the same effect as increasing the size of the training dataset.<n>This perspective highlights two regularization mechanisms taking place in denoising diffusions.
Score: 3.265950484493743
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Diffusion models now set the benchmark in high-fidelity generative sampling, yet they can, in principle, be prone to memorization. In this case, their learned score overfits the finite dataset so that the reverse-time SDE samples are mostly training points. In this paper, we interpret the empirical score as a noisy version of the true score and show that its covariance matrix is asymptotically a re-weighted data PCA. In large dimension, the small time limit makes the noise variance blow up while simultaneously reducing spatial correlation. To reduce this variance, we introduce a kernel-smoothed empirical score and analyze its bias-variance trade-off. We derive asymptotic bounds on the Kullback-Leibler divergence between the true distribution and the one generated by the modified reverse SDE. Regularization on the score has the same effect as increasing the size of the training dataset, and thus helps prevent memorization. A spectral decomposition of the forward diffusion suggests better variance control under some regularity conditions of the true data distribution. Reverse diffusion with kernel-smoothed empirical score can be reformulated as a gradient descent drifted toward a Log-Exponential Double-Kernel Density Estimator (LED-KDE). This perspective highlights two regularization mechanisms taking place in denoising diffusions: an initial Gaussian kernel first diffuses mass isotropically in the ambient space, while a second kernel applied in score space concentrates and spreads that mass along the data manifold. Hence, even a straightforward regularization-without any learning-already mitigates memorization and enhances generalization. Numerically, we illustrate our results with several experiments on synthetic and MNIST datasets.

Related papers

Learning What Matters: Steering Diffusion via Spectrally Anisotropic Forward Noise [43.07594740645669]
Diffusion Probabilistic Models (DPMs) have achieved strong generative performance, yet their inductive biases remain largely implicit.<n>In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model.<n>We introduce an anisotropic noise operator that shapes these biases by replacing the isotropic forward covariance with a structured, frequency-diagonal covariance.
arXiv Detail & Related papers (2025-10-07T16:08:39Z)
Authentic Discrete Diffusion Model [72.31371542619121]
Authentic Discrete Diffusion (ADD) framework redefines prior pseudo-discrete approaches.<n>ADD reformulates the diffusion input by directly using float-encoded one-hot class data.<n> experiments demonstrate that ADD achieves superior performance on classification tasks compared to the baseline.
arXiv Detail & Related papers (2025-10-01T15:51:10Z)
Beyond Scores: Proximal Diffusion Models [10.27283386401996]
We develop Proximal Diffusion Models (ProxDM) to learn proximal operators of the log-density.<n>We show that two variants of ProxDM achieve significantly faster within just a few sampling steps compared to conventional score-matching methods.
arXiv Detail & Related papers (2025-07-11T18:30:09Z)
An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models [29.972063833424215]
We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training.<n>We integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution.
arXiv Detail & Related papers (2025-03-05T05:50:38Z)
Ensemble Kalman filter in latent space using a variational autoencoder pair [0.2383122657918106]
variational autoencoder (VAE) is a machine learning (ML) technique that allows to map an arbitrary distribution to/from a latent space.<n>We propose a novel hybrid DA-ML approach in which VAEs are incorporated in the DA procedure.
arXiv Detail & Related papers (2025-02-18T16:11:05Z)
On the Wasserstein Convergence and Straightness of Rectified Flow [54.580605276017096]
Rectified Flow (RF) is a generative model that aims to learn straight flow trajectories from noise to data.<n>We provide a theoretical analysis of the Wasserstein distance between the sampling distribution of RF and the target distribution.<n>We present general conditions guaranteeing uniqueness and straightness of 1-RF, which is in line with previous empirical findings.
arXiv Detail & Related papers (2024-10-19T02:36:11Z)
On the Relation Between Linear Diffusion and Power Iteration [42.158089783398616]
We study the generation process as a correlation machine'' We show that low frequencies emerge earlier in the generation process, where the denoising basis vectors are more aligned to the true data with a rate depending on their eigenvalues. This model allows us to show that the linear diffusion model converges in mean to the leading eigenvector of the underlying data, similarly to the prevalent power iteration method.
arXiv Detail & Related papers (2024-10-16T07:33:12Z)
Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction. We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data. Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z)
Denoising Diffusion Samplers [41.796349001299156]
Denoising diffusion models are a popular class of generative models providing state-of-the-art results in many domains. We explore a similar idea to sample approximately from unnormalized probability density functions and estimate their normalizing constants. While score matching is not applicable in this context, we can leverage many of the ideas introduced in generative modeling for Monte Carlo sampling.
arXiv Detail & Related papers (2023-02-27T14:37:16Z)
Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain. We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z)
From Denoising Diffusions to Denoising Markov Models [38.33676858989955]
Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. We propose a unifying framework generalising this approach to a wide class of spaces and leading to an original extension of score matching.
arXiv Detail & Related papers (2022-11-07T14:34:27Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime [32.65347128465841]
Deep neural networks can achieve remarkable performances while interpolating the training data perfectly. Rather than the U-curve of the bias-variance trade-off, their test error often follows a "double descent" We develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks.
arXiv Detail & Related papers (2020-03-02T17:39:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.