A Mathematical Framework for Learning Probability Distributions
- URL: http://arxiv.org/abs/2212.11481v1
- Date: Thu, 22 Dec 2022 04:41:45 GMT
- Title: A Mathematical Framework for Learning Probability Distributions
- Authors: Hongkang Yang
- Abstract summary: generative modeling and density estimation has become an immensely popular subject in recent years.
This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles.
In particular, we prove that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The modeling of probability distributions, specifically generative modeling
and density estimation, has become an immensely popular subject in recent years
by virtue of its outstanding performance on sophisticated data such as images
and texts. Nevertheless, a theoretical understanding of its success is still
incomplete. One mystery is the paradox between memorization and generalization:
In theory, the model is trained to be exactly the same as the empirical
distribution of the finite samples, whereas in practice, the trained model can
generate new samples or estimate the likelihood of unseen samples. Likewise,
the overwhelming diversity of distribution learning models calls for a unified
perspective on this subject. This paper provides a mathematical framework such
that all the well-known models can be derived based on simple principles. To
demonstrate its efficacy, we present a survey of our results on the
approximation error, training error and generalization error of these models,
which can all be established based on this framework. In particular, the
aforementioned paradox is resolved by proving that these models enjoy implicit
regularization during training, so that the generalization error at
early-stopping avoids the curse of dimensionality. Furthermore, we provide some
new results on landscape analysis and the mode collapse phenomenon.
Related papers
- Distribution Learning and Its Application in Deep Learning [5.281849820329249]
This paper introduces a novel theoretical learning framework, termed probability distribution learning (PD learning)
PD learning focuses on learning the underlying probability distribution, which is modeled as a random variable within the probability simplex.
arXiv Detail & Related papers (2024-06-09T06:49:22Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Likelihood Based Inference in Fully and Partially Observed Exponential Family Graphical Models with Intractable Normalizing Constants [4.532043501030714]
Probabilistic graphical models that encode an underlying Markov random field are fundamental building blocks of generative modeling.
This paper is to demonstrate that full likelihood based analysis of these models is feasible in a computationally efficient manner.
arXiv Detail & Related papers (2024-04-27T02:58:22Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Bi-Noising Diffusion: Towards Conditional Diffusion Models with
Generative Restoration Priors [64.24948495708337]
We introduce a new method that brings predicted samples to the training data manifold using a pretrained unconditional diffusion model.
We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks.
arXiv Detail & Related papers (2022-12-14T17:26:35Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Why do classifier accuracies show linear trends under distribution
shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.
We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone.
We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z) - Generalization and Memorization: The Bias Potential Model [9.975163460952045]
generative models and density estimators behave quite differently from models for learning functions.
For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted.
In the long term, the model either memorizes the samples or diverges.
arXiv Detail & Related papers (2020-11-29T04:04:54Z) - Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep
Learning [29.473503894240096]
We focus on the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex.
This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others.
We propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing.
arXiv Detail & Related papers (2020-11-10T16:44:35Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.