Improved Convergence Guarantees for Learning Gaussian Mixture Models by
EM and Gradient EM
- URL: http://arxiv.org/abs/2101.00575v1
- Date: Sun, 3 Jan 2021 08:10:01 GMT
- Title: Improved Convergence Guarantees for Learning Gaussian Mixture Models by
EM and Gradient EM
- Authors: Nimrod Segol, Boaz Nadler
- Abstract summary: We consider the problem of estimating the parameters a Gaussian Mixture Model with K components of known weights.
We present a sharper analysis of the local convergence of EM and gradient EM, compared to previous works.
Our second contribution are improved sample size requirements for accurate estimation by EM and gradient EM.
- Score: 15.251738476719918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of estimating the parameters a Gaussian Mixture Model
with K components of known weights, all with an identity covariance matrix. We
make two contributions. First, at the population level, we present a sharper
analysis of the local convergence of EM and gradient EM, compared to previous
works. Assuming a separation of $\Omega(\sqrt{\log K})$, we prove convergence
of both methods to the global optima from an initialization region larger than
those of previous works. Specifically, the initial guess of each component can
be as far as (almost) half its distance to the nearest Gaussian. This is
essentially the largest possible contraction region. Our second contribution
are improved sample size requirements for accurate estimation by EM and
gradient EM. In previous works, the required number of samples had a quadratic
dependence on the maximal separation between the K components, and the
resulting error estimate increased linearly with this maximal separation. In
this manuscript we show that both quantities depend only logarithmically on the
maximal separation.
Related papers
- Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models [47.294535652946095]
We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM)
This is the first global convergence result for Gaussian mixtures with more than $2$ components.
arXiv Detail & Related papers (2024-06-29T16:44:29Z) - A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models [21.436254507839738]
We propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models.
We show that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.
arXiv Detail & Related papers (2024-04-19T03:53:50Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Beyond EM Algorithm on Over-specified Two-Component Location-Scale
Gaussian Mixtures [29.26015093627193]
We develop the Exponential Location Update (ELU) algorithm to efficiently explore the curvature of the negative log-likelihood functions.
We demonstrate that the ELU algorithm converges to the final statistical radius of the models after a logarithmic number of iterations.
arXiv Detail & Related papers (2022-05-23T06:49:55Z) - Clustering a Mixture of Gaussians with Unknown Covariance [4.821312633849745]
We derive a Max-Cut integer program based on maximum likelihood estimation.
We develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size.
We generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights.
arXiv Detail & Related papers (2021-10-04T17:59:20Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs)
Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z) - The EM Algorithm gives Sample-Optimality for Learning Mixtures of
Well-Separated Gaussians [21.780375994511324]
We prove a new result for the ExpectationMaximization (EM): we show that EM converges locally, under separation $Omega(sqrtlog k)$.
Our results do not assume or use prior knowledge of the (potentially different) Gaussian components.
arXiv Detail & Related papers (2020-02-02T05:09:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.