An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models
- URL: http://arxiv.org/abs/2503.03206v1
- Date: Wed, 05 Mar 2025 05:50:38 GMT
- Title: An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models
- Authors: Binxu Wang,
- Abstract summary: We developed an analytical framework for understanding how the learned distribution evolves during diffusion model training.<n>We derived exact solutions for the gradient-flow dynamics of weights in one- or two-layer linear denoiser settings with arbitrary data.<n>These solutions allowed us to derive the generated distribution in closed form and its KL divergence through training.
- Score: 2.1756081703276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We developed an analytical framework for understanding how the learned distribution evolves during diffusion model training. Leveraging the Gaussian equivalence principle, we derived exact solutions for the gradient-flow dynamics of weights in one- or two-layer linear denoiser settings with arbitrary data. Remarkably, these solutions allowed us to derive the generated distribution in closed form and its KL divergence through training. These analytical results expose a pronounced power-law spectral bias, i.e., for weights and distributions, the convergence time of a mode follows an inverse power law of its variance. Empirical experiments on both Gaussian and image datasets demonstrate that the power-law spectral bias remains robust even when using deeper or convolutional architectures. Our results underscore the importance of the data covariance in dictating the order and rate at which diffusion models learn different modes of the data, providing potential explanations for why earlier stopping could lead to incorrect details in image generative models.
Related papers
- Generalization through variance: how noise shapes inductive biases in diffusion models [0.0]
We develop a mathematical theory that partly explains 'generalization through variance' phenomenon.
We find that the distributions diffusion models effectively learn to sample from resemble their training distributions.
We also characterize how this inductive bias interacts with feature-related inductive biases.
arXiv Detail & Related papers (2025-04-16T23:41:10Z) - Critical Iterative Denoising: A Discrete Generative Model Applied to Graphs [52.50288418639075]
We propose a novel framework called Iterative Denoising, which simplifies discrete diffusion and circumvents the issue by assuming conditional independence across time.
Our empirical evaluations demonstrate that the proposed method significantly outperforms existing discrete diffusion baselines in graph generation tasks.
arXiv Detail & Related papers (2025-03-27T15:08:58Z) - Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure [8.320632531909682]
We study the generalizability of diffusion models by looking into the hidden properties of the learned score functions.<n>As diffusion models transition from memorization to generalization, their corresponding nonlinear diffusion denoisers exhibit increasing linearity.
arXiv Detail & Related papers (2024-10-31T15:57:04Z) - On the Relation Between Linear Diffusion and Power Iteration [42.158089783398616]
We study the generation process as a correlation machine''
We show that low frequencies emerge earlier in the generation process, where the denoising basis vectors are more aligned to the true data with a rate depending on their eigenvalues.
This model allows us to show that the linear diffusion model converges in mean to the leading eigenvector of the underlying data, similarly to the prevalent power iteration method.
arXiv Detail & Related papers (2024-10-16T07:33:12Z) - How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework [11.71206628091551]
We propose a comprehensive framework for the error analysis of discrete diffusion models based on L'evy-type integrals.
Our framework unifies and strengthens the current theoretical results on discrete diffusion models.
arXiv Detail & Related papers (2024-10-04T16:59:29Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - On Error Propagation of Diffusion Models [77.91480554418048]
We develop a theoretical framework to mathematically formulate error propagation in the architecture of DMs.
We apply the cumulative error as a regularization term to reduce error propagation.
Our proposed regularization reduces error propagation, significantly improves vanilla DMs, and outperforms previous baselines.
arXiv Detail & Related papers (2023-08-09T15:31:17Z) - A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - A Variational Perspective on Solving Inverse Problems with Diffusion
Models [101.831766524264]
Inverse tasks can be formulated as inferring a posterior distribution over data.
This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable.
We propose a variational approach that by design seeks to approximate the true posterior distribution.
arXiv Detail & Related papers (2023-05-07T23:00:47Z) - Diffusion Models are Minimax Optimal Distribution Estimators [49.47503258639454]
We provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling.
We show that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates.
arXiv Detail & Related papers (2023-03-03T11:31:55Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Diffusion-GAN: Training GANs with Diffusion [135.24433011977874]
Generative adversarial networks (GANs) are challenging to train stably.
We propose Diffusion-GAN, a novel GAN framework that leverages a forward diffusion chain to generate instance noise.
We show that Diffusion-GAN can produce more realistic images with higher stability and data efficiency than state-of-the-art GANs.
arXiv Detail & Related papers (2022-06-05T20:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.