Self-regularizing Property of Nonparametric Maximum Likelihood Estimator
in Mixture Models
- URL: http://arxiv.org/abs/2008.08244v2
- Date: Mon, 7 Sep 2020 01:10:12 GMT
- Title: Self-regularizing Property of Nonparametric Maximum Likelihood Estimator
in Mixture Models
- Authors: Yury Polyanskiy and Yihong Wu
- Abstract summary: We introduce the nonparametric maximum likelihood (NPMLE) model for general Gaussian mixtures.
We show that with high probability the NPMLE based on a sample size has $O(log n)$ atoms (mass points)
Notably, any mixture is statistically in from a finite one with $Olog selection.
- Score: 39.27013036481509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Introduced by Kiefer and Wolfowitz \cite{KW56}, the nonparametric maximum
likelihood estimator (NPMLE) is a widely used methodology for learning mixture
odels and empirical Bayes estimation. Sidestepping the non-convexity in mixture
likelihood, the NPMLE estimates the mixing distribution by maximizing the total
likelihood over the space of probability measures, which can be viewed as an
extreme form of overparameterization.
In this paper we discover a surprising property of the NPMLE solution.
Consider, for example, a Gaussian mixture model on the real line with a
subgaussian mixing distribution. Leveraging complex-analytic techniques, we
show that with high probability the NPMLE based on a sample of size $n$ has
$O(\log n)$ atoms (mass points), significantly improving the deterministic
upper bound of $n$ due to Lindsay \cite{lindsay1983geometry1}. Notably, any
such Gaussian mixture is statistically indistinguishable from a finite one with
$O(\log n)$ components (and this is tight for certain mixtures). Thus, absent
any explicit form of model selection, NPMLE automatically chooses the right
model complexity, a property we term \emph{self-regularization}. Extensions to
other exponential families are given. As a statistical application, we show
that this structural property can be harnessed to bootstrap existing Hellinger
risk bound of the (parametric) MLE for finite Gaussian mixtures to the NPMLE
for general Gaussian mixtures, recovering a result of Zhang
\cite{zhang2009generalized}.
Related papers
- Learning large softmax mixtures with warm start EM [17.081578976570437]
Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute from $p$ possible candidates.
softmax mixtures are routinely used in the final layer of a neural network to map a large number $p$ of vectors in $mathbbRL$ to a probability vector.
arXiv Detail & Related papers (2024-09-16T00:14:48Z) - Universality laws for Gaussian mixtures in generalized linear models [22.154969876570238]
We investigate the joint statistics of the family of generalized linear estimators $(Theta_1, dots, Theta_M)$.
This allow us to prove the universality of different quantities of interest, such as the training and generalization errors.
We discuss the applications of our results to different machine learning tasks of interest, such as ensembling and uncertainty.
arXiv Detail & Related papers (2023-02-17T15:16:06Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Clustering a Mixture of Gaussians with Unknown Covariance [4.821312633849745]
We derive a Max-Cut integer program based on maximum likelihood estimation.
We develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size.
We generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights.
arXiv Detail & Related papers (2021-10-04T17:59:20Z) - Maximum Entropy Reinforcement Learning with Mixture Policies [54.291331971813364]
We construct a tractable approximation of the mixture entropy using MaxEnt algorithms.
We show that it is closely related to the sum of marginal entropies.
We derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.
arXiv Detail & Related papers (2021-03-18T11:23:39Z) - A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models [78.6363825307044]
This work presents a mathematical treatment of the relation between Self-Organizing Maps (SOMs) and Gaussian Mixture Models (GMMs)
We show that energy-based SOM models can be interpreted as performing gradient descent.
This link allows to treat SOMs as generative probabilistic models, giving a formal justification for using SOMs to detect outliers, or for sampling.
arXiv Detail & Related papers (2020-09-24T14:09:04Z) - Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes.
In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model.
We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z) - The EM Algorithm gives Sample-Optimality for Learning Mixtures of
Well-Separated Gaussians [21.780375994511324]
We prove a new result for the ExpectationMaximization (EM): we show that EM converges locally, under separation $Omega(sqrtlog k)$.
Our results do not assume or use prior knowledge of the (potentially different) Gaussian components.
arXiv Detail & Related papers (2020-02-02T05:09:26Z) - Algebraic and Analytic Approaches for Parameter Learning in Mixture
Models [66.96778152993858]
We present two different approaches for parameter learning in several mixture models in one dimension.
For some of these distributions, our results represent the first guarantees for parameter estimation.
arXiv Detail & Related papers (2020-01-19T05:10:56Z) - Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures.
We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components.
The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.