Related papers: Generalization through variance: how noise shapes inductive biases in diffusion models

Generalization through variance: how noise shapes inductive biases in diffusion models

URL: http://arxiv.org/abs/2504.12532v1
Date: Wed, 16 Apr 2025 23:41:10 GMT
Title: Generalization through variance: how noise shapes inductive biases in diffusion models
Authors: John J. Vastola,
Abstract summary: We develop a mathematical theory that partly explains 'generalization through variance' phenomenon.<n>We find that the distributions diffusion models effectively learn to sample from resemble their training distributions.<n>We also characterize how this inductive bias interacts with feature-related inductive biases.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How diffusion models generalize beyond their training set is not known, and is somewhat mysterious given two facts: the optimum of the denoising score matching (DSM) objective usually used to train diffusion models is the score function of the training distribution; and the networks usually used to learn the score function are expressive enough to learn this score to high accuracy. We claim that a certain feature of the DSM objective -- the fact that its target is not the training distribution's score, but a noisy quantity only equal to it in expectation -- strongly impacts whether and to what extent diffusion models generalize. In this paper, we develop a mathematical theory that partly explains this 'generalization through variance' phenomenon. Our theoretical analysis exploits a physics-inspired path integral approach to compute the distributions typically learned by a few paradigmatic under- and overparameterized diffusion models. We find that the distributions diffusion models effectively learn to sample from resemble their training distributions, but with 'gaps' filled in, and that this inductive bias is due to the covariance structure of the noisy target used during training. We also characterize how this inductive bias interacts with feature-related inductive biases.

Related papers

Understanding Generalization in Diffusion Models via Probability Flow Distance [7.675910526644439]
We introduce probability flow distance ($texttPFD$) to measure distributional generalization.<n>We empirically uncover several key generalization behaviors in diffusion models.
arXiv Detail & Related papers (2025-05-26T15:23:50Z)
Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure [8.320632531909682]
We study the generalizability of diffusion models by looking into the hidden properties of the learned score functions.<n>As diffusion models transition from memorization to generalization, their corresponding nonlinear diffusion denoisers exhibit increasing linearity.
arXiv Detail & Related papers (2024-10-31T15:57:04Z)
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models [22.39558434131574]
Existing data attribution methods for diffusion models typically quantify the contribution of a training sample. We argue that the direct usage of diffusion loss cannot represent such a contribution accurately due to the calculation of diffusion loss. We propose Diffusion Attribution Score (textitDAS) to measure the direct comparison between predicted distributions with an attribution score.
arXiv Detail & Related papers (2024-10-24T10:58:17Z)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers.<n>We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.<n>This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z)
Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z)
Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization [45.72323731094864]
We present a theoretical framework to analyze two-layer neural network-based diffusion models. We prove that training shallow neural networks for score prediction can be done by solving a single convex program. Our results provide a precise characterization of what neural network-based diffusion models learn in non-asymptotic settings.
arXiv Detail & Related papers (2024-02-03T00:20:25Z)
Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts. We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep. We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
On the Generalization Properties of Diffusion Models [31.067038651873126]
This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models. We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models. We extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities.
arXiv Detail & Related papers (2023-11-03T09:20:20Z)
The Emergence of Reproducibility and Generalizability in Diffusion Models [10.188731323681575]
Given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We show that diffusion models are learning distinct distributions affected by the training data size. This valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning.
arXiv Detail & Related papers (2023-10-08T19:02:46Z)
On the Generalization of Diffusion Model [42.447639515467934]
We define the generalization of the generative model, which is measured by the mutual information between the generated data and the training set. We show that for the empirical optimal diffusion model, the data generated by a deterministic sampler are all highly related to the training set, thus poor generalization. We propose another training objective whose empirical optimal solution has no potential generalization problem.
arXiv Detail & Related papers (2023-05-24T04:27:57Z)
Diffusion Models are Minimax Optimal Distribution Estimators [49.47503258639454]
We provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling. We show that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates.
arXiv Detail & Related papers (2023-03-03T11:31:55Z)
How Much is Enough? A Study on Diffusion Times in Score-based Generative Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution. We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
Why do classifier accuracies show linear trends under distribution shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution. We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone. We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.