On the Generalization of Diffusion Model
- URL: http://arxiv.org/abs/2305.14712v1
- Date: Wed, 24 May 2023 04:27:57 GMT
- Title: On the Generalization of Diffusion Model
- Authors: Mingyang Yi, Jiacheng Sun, Zhenguo Li
- Abstract summary: We define the generalization of the generative model, which is measured by the mutual information between the generated data and the training set.
We show that for the empirical optimal diffusion model, the data generated by a deterministic sampler are all highly related to the training set, thus poor generalization.
We propose another training objective whose empirical optimal solution has no potential generalization problem.
- Score: 42.447639515467934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The diffusion probabilistic generative models are widely used to generate
high-quality data. Though they can synthetic data that does not exist in the
training set, the rationale behind such generalization is still unexplored. In
this paper, we formally define the generalization of the generative model,
which is measured by the mutual information between the generated data and the
training set. The definition originates from the intuition that the model which
generates data with less correlation to the training set exhibits better
generalization ability. Meanwhile, we show that for the empirical optimal
diffusion model, the data generated by a deterministic sampler are all highly
related to the training set, thus poor generalization. This result contradicts
the observation of the trained diffusion model's (approximating empirical
optima) extrapolation ability (generating unseen data). To understand this
contradiction, we empirically verify the difference between the sufficiently
trained diffusion model and the empirical optima. We found, though obtained
through sufficient training, there still exists a slight difference between
them, which is critical to making the diffusion model generalizable. Moreover,
we propose another training objective whose empirical optimal solution has no
potential generalization problem. We empirically show that the proposed
training objective returns a similar model to the original one, which further
verifies the generalization ability of the trained diffusion model.
Related papers
- Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure [8.320632531909682]
We study the generalizability of diffusion models by looking into the hidden properties of the learned score functions.
As diffusion models transition from memorization to generalization, their corresponding nonlinear diffusion denoisers exhibit increasing linearity.
arXiv Detail & Related papers (2024-10-31T15:57:04Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - On the Generalization Properties of Diffusion Models [33.93850788633184]
This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models.
We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models.
We extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities.
arXiv Detail & Related papers (2023-11-03T09:20:20Z) - The Emergence of Reproducibility and Generalizability in Diffusion Models [10.188731323681575]
Given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs.
We show that diffusion models are learning distinct distributions affected by the training data size.
This valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning.
arXiv Detail & Related papers (2023-10-08T19:02:46Z) - On Memorization in Diffusion Models [46.656797890144105]
We show that memorization behaviors tend to occur on smaller-sized datasets.
We quantify the impact of the influential factors on these memorization behaviors in terms of effective model memorization (EMM)
Our study holds practical significance for diffusion model users and offers clues to theoretical research in deep generative models.
arXiv Detail & Related papers (2023-10-04T09:04:20Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Diffusion Models are Minimax Optimal Distribution Estimators [49.47503258639454]
We provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling.
We show that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates.
arXiv Detail & Related papers (2023-03-03T11:31:55Z) - Why do classifier accuracies show linear trends under distribution
shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.
We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone.
We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z) - Generalization and Memorization: The Bias Potential Model [9.975163460952045]
generative models and density estimators behave quite differently from models for learning functions.
For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted.
In the long term, the model either memorizes the samples or diverges.
arXiv Detail & Related papers (2020-11-29T04:04:54Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.