Related papers: Smoothing the Score Function for Generalization in Diffusion Models: An Optimization-based Explanation Framework

Smoothing the Score Function for Generalization in Diffusion Models: An Optimization-based Explanation Framework

URL: http://arxiv.org/abs/2601.19285v1
Date: Tue, 27 Jan 2026 07:16:44 GMT
Title: Smoothing the Score Function for Generalization in Diffusion Models: An Optimization-based Explanation Framework
Authors: Xinyu Zhou, Jiawei Zhang, Stephen J. Wright,
Abstract summary: Diffusion models achieve remarkable generation quality, yet face a fundamental challenge known as memorization.<n>We develop a theoretical framework to explain this phenomenon by showing that the empirical score function is a weighted sum of the score functions of Gaussian distributions.<n>In practice, approximating the empirical score function with a neural network can partially alleviate this issue and improve generalization.
Score: 18.032864089341327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models achieve remarkable generation quality, yet face a fundamental challenge known as memorization, where generated samples can replicate training samples exactly. We develop a theoretical framework to explain this phenomenon by showing that the empirical score function (the score function corresponding to the empirical distribution) is a weighted sum of the score functions of Gaussian distributions, in which the weights are sharp softmax functions. This structure causes individual training samples to dominate the score function, resulting in sampling collapse. In practice, approximating the empirical score function with a neural network can partially alleviate this issue and improve generalization. Our theoretical framework explains why: In training, the neural network learns a smoother approximation of the weighted sum, allowing the sampling process to be influenced by local manifolds rather than single points. Leveraging this insight, we propose two novel methods to further enhance generalization: (1) Noise Unconditioning enables each training sample to adaptively determine its score function weight to increase the effect of more training samples, thereby preventing single-point dominance and mitigating collapse. (2) Temperature Smoothing introduces an explicit parameter to control the smoothness. By increasing the temperature in the softmax weights, we naturally reduce the dominance of any single training sample and mitigate memorization. Experiments across multiple datasets validate our theoretical analysis and demonstrate the effectiveness of the proposed methods in improving generalization while maintaining high generation quality.

Related papers

A Classical View on Benign Overfitting: The Role of Sample Size [14.36840959836957]
We focus on almost benign overfitting, where models simultaneously achieve both arbitrarily small training and test errors.<n>This behavior is characteristic of neural networks, which often achieve low (but non-zero) training error while still generalizing well.
arXiv Detail & Related papers (2025-05-16T18:37:51Z)
On the Interpolation Effect of Score Smoothing in Diffusion Models [12.335698325757493]
We study the hypothesis that such creativity arises from an effect caused by a smoothing of the empirical score function.<n>We show theoretically how regularized two-layer ReLU neural networks tend to learn approximately a smoothed version of the empirical score function.<n>We present experimental evidence that learning score functions with neural networks indeed induces a score smoothing effect.
arXiv Detail & Related papers (2025-02-26T19:04:01Z)
Dimension-free Score Matching and Time Bootstrapping for Diffusion Models [19.62665684173391]
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels.<n>We introduce a martingale-based error decomposition and sharp variance bounds, enabling efficient learning from dependent data.<n>Building on these insights, we propose Bootstrapped Score Matching (BSM), a variance reduction technique that leverages previously learned scores to improve accuracy at higher noise levels.
arXiv Detail & Related papers (2025-02-14T18:32:22Z)
Memorization and Regularization in Generative Diffusion Models [5.128303432235475]
Diffusion models have emerged as a powerful framework for generative modeling.<n>The analysis highlights the need for regularization to avoid reproducing the analytically tractable minimizer.<n>Experiments are evaluated in the context of memorization, and directions for future development of regularization are highlighted.
arXiv Detail & Related papers (2025-01-27T05:17:06Z)
Feasible Learning [78.6167929413604]
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample.<n>Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
arXiv Detail & Related papers (2025-01-24T20:39:38Z)
The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications [1.8416014644193066]
We compare learned neural scores to the scores of two kinds of analytically tractable distributions.<n>We claim that the learned neural score is dominated by its linear (Gaussian) approximation for moderate to high noise scales.<n>We show that this allows the skipping of the first 15-30% of sampling steps while maintaining high sample quality.
arXiv Detail & Related papers (2024-12-12T21:31:27Z)
Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images. We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z)
Generative Diffusion From An Action Principle [0.0]
We show that score matching can be derived from an action principle, like the ones commonly used in physics. We use this insight to demonstrate the connection between different classes of diffusion models.
arXiv Detail & Related papers (2023-10-06T18:00:00Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Sensing Cox Processes via Posterior Sampling and Positive Bases [56.82162768921196]
We study adaptive sensing of point processes, a widely used model from spatial statistics. We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis. Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (textscCox-Thompson) and top-two posterior sampling (textscTop2) principles.
arXiv Detail & Related papers (2021-10-21T14:47:06Z)
Robust Sampling in Deep Learning [62.997667081978825]
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. During the training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization.
arXiv Detail & Related papers (2020-06-04T09:46:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.