The continuous categorical: a novel simplex-valued exponential family
- URL: http://arxiv.org/abs/2002.08563v2
- Date: Mon, 8 Jun 2020 17:13:08 GMT
- Title: The continuous categorical: a novel simplex-valued exponential family
- Authors: Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John P. Cunningham
- Abstract summary: We show that standard choices for simplex-valued data suffer from a number of limitations, including bias and numerical issues.
We resolve these limitations by introducing a novel exponential family of distributions for modeling simplex-valued data.
Unlike the Dirichlet and other typical choices, the continuous categorical results in a well-behaved probabilistic loss function.
- Score: 23.983555024375306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simplex-valued data appear throughout statistics and machine learning, for
example in the context of transfer learning and compression of deep networks.
Existing models for this class of data rely on the Dirichlet distribution or
other related loss functions; here we show these standard choices suffer
systematically from a number of limitations, including bias and numerical
issues that frustrate the use of flexible network models upstream of these
distributions. We resolve these limitations by introducing a novel exponential
family of distributions for modeling simplex-valued data - the continuous
categorical, which arises as a nontrivial multivariate generalization of the
recently discovered continuous Bernoulli. Unlike the Dirichlet and other
typical choices, the continuous categorical results in a well-behaved
probabilistic loss function that produces unbiased estimators, while preserving
the mathematical simplicity of the Dirichlet. As well as exploring its
theoretical properties, we introduce sampling methods for this distribution
that are amenable to the reparameterization trick, and evaluate their
performance. Lastly, we demonstrate that the continuous categorical outperforms
standard choices empirically, across a simulation study, an applied example on
multi-party elections, and a neural network compression task.
Related papers
- Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep
Learning [29.473503894240096]
We focus on the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex.
This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others.
We propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing.
arXiv Detail & Related papers (2020-11-10T16:44:35Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Distribution Regression for Sequential Data [20.77698059067596]
We develop a rigorous framework for distribution regression where inputs are complex data streams.
We introduce two new learning techniques, one feature-based and the other kernel-based.
arXiv Detail & Related papers (2020-06-10T12:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.