Generalization and Memorization: The Bias Potential Model
- URL: http://arxiv.org/abs/2011.14269v4
- Date: Tue, 2 Mar 2021 03:57:31 GMT
- Title: Generalization and Memorization: The Bias Potential Model
- Authors: Hongkang Yang and Weinan E
- Abstract summary: generative models and density estimators behave quite differently from models for learning functions.
For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted.
In the long term, the model either memorizes the samples or diverges.
- Score: 9.975163460952045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Models for learning probability distributions such as generative models and
density estimators behave quite differently from models for learning functions.
One example is found in the memorization phenomenon, namely the ultimate
convergence to the empirical distribution, that occurs in generative
adversarial networks (GANs). For this reason, the issue of generalization is
more subtle than that for supervised learning. For the bias potential model, we
show that dimension-independent generalization accuracy is achievable if early
stopping is adopted, despite that in the long term, the model either memorizes
the samples or diverges.
Related papers
- Universality in Transfer Learning for Linear Models [18.427215139020625]
We study the problem of transfer learning in linear models for both regression and binary classification.
We provide an exact and rigorous analysis and relate generalization errors (in regression) and classification errors (in binary classification) for the pretrained and fine-tuned models.
arXiv Detail & Related papers (2024-10-03T03:09:09Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - On the Generalization Properties of Diffusion Models [33.93850788633184]
This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models.
We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models.
We extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities.
arXiv Detail & Related papers (2023-11-03T09:20:20Z) - On the Generalization of Diffusion Model [42.447639515467934]
We define the generalization of the generative model, which is measured by the mutual information between the generated data and the training set.
We show that for the empirical optimal diffusion model, the data generated by a deterministic sampler are all highly related to the training set, thus poor generalization.
We propose another training objective whose empirical optimal solution has no potential generalization problem.
arXiv Detail & Related papers (2023-05-24T04:27:57Z) - A Mathematical Framework for Learning Probability Distributions [0.0]
generative modeling and density estimation has become an immensely popular subject in recent years.
This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles.
In particular, we prove that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality.
arXiv Detail & Related papers (2022-12-22T04:41:45Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Why do classifier accuracies show linear trends under distribution
shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.
We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone.
We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.