On The Statistical Representation Properties Of The Perturb-Softmax And The Perturb-Argmax Probability Distributions
- URL: http://arxiv.org/abs/2406.02180v1
- Date: Tue, 4 Jun 2024 10:22:12 GMT
- Title: On The Statistical Representation Properties Of The Perturb-Softmax And The Perturb-Argmax Probability Distributions
- Authors: Hedda Cohen Indelman, Tamir Hazan,
- Abstract summary: Gumbel-Softmax and Gumbel-Argmax probability distributions are useful in learning discrete structures in discriminative learning.
Despite the efforts invested in optimizing these probability models, their statistical properties are under-explored.
We investigate their representation properties and determine for which families of parameters these probability distributions are complete.
We conclude the analysis by identifying two sets of parameters that satisfy these assumptions and thus admit a complete and minimal representation.
- Score: 17.720298535412443
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Gumbel-Softmax probability distribution allows learning discrete tokens in generative learning, while the Gumbel-Argmax probability distribution is useful in learning discrete structures in discriminative learning. Despite the efforts invested in optimizing these probability models, their statistical properties are under-explored. In this work, we investigate their representation properties and determine for which families of parameters these probability distributions are complete, i.e., can represent any probability distribution, and minimal, i.e., can represent a probability distribution uniquely. We rely on convexity and differentiability to determine these statistical conditions and extend this framework to general probability models, such as Gaussian-Softmax and Gaussian-Argmax. We experimentally validate the qualities of these extensions, which enjoy a faster convergence rate. We conclude the analysis by identifying two sets of parameters that satisfy these assumptions and thus admit a complete and minimal representation. Our contribution is theoretical with supporting practical evaluation.
Related papers
- A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models [6.647819824559201]
We study the large-sample properties of a likelihood-based approach for estimating conditional deep generative models.
Our results lead to the convergence rate of a sieve maximum likelihood estimator for estimating the conditional distribution.
arXiv Detail & Related papers (2024-10-02T20:46:21Z) - Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution [2.1146241717926664]
We show that the Wasserstein GAN, constrained to left-invertible push-forward maps, generates distributions that avoid replication and significantly deviate from the empirical distribution.
Our most important contribution provides a finite-sample lower bound on the Wasserstein-1 distance between the generative distribution and the empirical one.
We also establish a finite-sample upper bound on the distance between the generative distribution and the true data-generating one.
arXiv Detail & Related papers (2023-07-31T06:11:57Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - Ensemble Multi-Quantiles: Adaptively Flexible Distribution Prediction
for Uncertainty Quantification [4.728311759896569]
We propose a novel, succinct, and effective approach for distribution prediction to quantify uncertainty in machine learning.
It incorporates adaptively flexible distribution prediction of $mathbbP(mathbfy|mathbfX=x)$ in regression tasks.
On extensive regression tasks from UCI datasets, we show that EMQ achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-11-26T11:45:32Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Wrapped Distributions on homogeneous Riemannian manifolds [58.720142291102135]
Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions.
We empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model.
arXiv Detail & Related papers (2022-04-20T21:25:21Z) - Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM.
We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z) - Learning Structured Gaussians to Approximate Deep Ensembles [10.055143995729415]
This paper proposes using a sparse-structured multivariate Gaussian to provide a closed-form approxorimator for dense image prediction tasks.
We capture the uncertainty and structured correlations in the predictions explicitly in a formal distribution, rather than implicitly through sampling alone.
We demonstrate the merits of our approach on monocular depth estimation and show that the advantages of our approach are obtained with comparable quantitative performance.
arXiv Detail & Related papers (2022-03-29T12:34:43Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Distributionally Robust Parametric Maximum Likelihood Estimation [13.09499764232737]
We propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric nominal distribution.
Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.
arXiv Detail & Related papers (2020-10-11T19:05:49Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.