Related papers: A Mathematical Framework for Learning Probability Distributions

A Mathematical Framework for Learning Probability Distributions

URL: http://arxiv.org/abs/2212.11481v1
Date: Thu, 22 Dec 2022 04:41:45 GMT
Title: A Mathematical Framework for Learning Probability Distributions
Authors: Hongkang Yang
Abstract summary: generative modeling and density estimation has become an immensely popular subject in recent years. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. In particular, we prove that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Likewise, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. To demonstrate its efficacy, we present a survey of our results on the approximation error, training error and generalization error of these models, which can all be established based on this framework. In particular, the aforementioned paradox is resolved by proving that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality. Furthermore, we provide some new results on landscape analysis and the mode collapse phenomenon.

Related papers

Principled Out-of-Distribution Generalization via Simplicity [16.17883058788714]
We study the compositional generalization abilities of diffusion models in image generation.<n>We develop a theoretical framework for OOD generalization via simplicity, quantified using a predefined simplicity metric.<n>We establish the first sharp sample complexity guarantees for learning the true, generalizable, simple model.
arXiv Detail & Related papers (2025-05-28T17:44:10Z)
A Probabilistic Perspective on Model Collapse [9.087950471621653]
This paper aims to characterize the conditions under which model collapse occurs and, crucially, how it can be mitigated.<n>Under mild conditions, we rigorously show that progressively increasing the sample size at each training step is necessary to prevent model collapse.<n>We also investigate the probability that training on synthetic data yields models that outperform those trained solely on real data.
arXiv Detail & Related papers (2025-05-20T05:25:29Z)
Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning [79.65014491424151]
We propose a quantum Discrete Denoising Diffusion Probabilistic Model (QD3PM)<n>It enables joint probability learning through diffusion and denoising in exponentially large Hilbert spaces.<n>This paper establishes a new theoretical paradigm in generative models by leveraging the quantum advantage in joint distribution learning.
arXiv Detail & Related papers (2025-05-08T11:48:21Z)
Distribution Learning and Its Application in Deep Learning [5.281849820329249]
This paper introduces a novel theoretical learning framework, termed probability distribution learning (PD learning) PD learning focuses on learning the underlying probability distribution, which is modeled as a random variable within the probability simplex.
arXiv Detail & Related papers (2024-06-09T06:49:22Z)
Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications. Memorisation is the causal effect of training with an instance on the model's ability to predict that instance. This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z)
Likelihood Based Inference in Fully and Partially Observed Exponential Family Graphical Models with Intractable Normalizing Constants [4.532043501030714]
Probabilistic graphical models that encode an underlying Markov random field are fundamental building blocks of generative modeling. This paper is to demonstrate that full likelihood based analysis of these models is feasible in a computationally efficient manner.
arXiv Detail & Related papers (2024-04-27T02:58:22Z)
Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
Bi-Noising Diffusion: Towards Conditional Diffusion Models with Generative Restoration Priors [64.24948495708337]
We introduce a new method that brings predicted samples to the training data manifold using a pretrained unconditional diffusion model. We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks.
arXiv Detail & Related papers (2022-12-14T17:26:35Z)
Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model. We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z)
Why do classifier accuracies show linear trends under distribution shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution. We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone. We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z)
Generalization and Memorization: The Bias Potential Model [9.975163460952045]
generative models and density estimators behave quite differently from models for learning functions. For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted. In the long term, the model either memorizes the samples or diverges.
arXiv Detail & Related papers (2020-11-29T04:04:54Z)
Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning [29.473503894240096]
We focus on the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. We propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing.
arXiv Detail & Related papers (2020-11-10T16:44:35Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.