The EM Algorithm gives Sample-Optimality for Learning Mixtures of
Well-Separated Gaussians
- URL: http://arxiv.org/abs/2002.00329v2
- Date: Fri, 19 Jun 2020 17:36:40 GMT
- Title: The EM Algorithm gives Sample-Optimality for Learning Mixtures of
Well-Separated Gaussians
- Authors: Jeongyeol Kwon, Constantine Caramanis
- Abstract summary: We prove a new result for the ExpectationMaximization (EM): we show that EM converges locally, under separation $Omega(sqrtlog k)$.
Our results do not assume or use prior knowledge of the (potentially different) Gaussian components.
- Score: 21.780375994511324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of spherical Gaussian Mixture models with $k \geq 3$
components when the components are well separated. A fundamental previous
result established that separation of $\Omega(\sqrt{\log k})$ is necessary and
sufficient for identifiability of the parameters with polynomial sample
complexity (Regev and Vijayaraghavan, 2017). In the same context, we show that
$\tilde{O} (kd/\epsilon^2)$ samples suffice for any $\epsilon \lesssim 1/k$,
closing the gap from polynomial to linear, and thus giving the first optimal
sample upper bound for the parameter estimation of well-separated Gaussian
mixtures. We accomplish this by proving a new result for the
Expectation-Maximization (EM) algorithm: we show that EM converges locally,
under separation $\Omega(\sqrt{\log k})$. The previous best-known guarantee
required $\Omega(\sqrt{k})$ separation (Yan, et al., 2017). Unlike prior work,
our results do not assume or use prior knowledge of the (potentially different)
mixing weights or variances of the Gaussian components. Furthermore, our
results show that the finite-sample error of EM does not depend on
non-universal quantities such as pairwise distances between means of Gaussian
components.
Related papers
- Sample-Efficient Private Learning of Mixtures of Gaussians [11.392841244701605]
We prove that roughly $kd2 + k1.5 d1.75 + k2 d$ samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians.
Our work improves over the previous best result [AAL24b] and is provably optimal when $d$ is much larger than $k2$.
arXiv Detail & Related papers (2024-11-04T17:23:52Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis.
We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$.
Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z) - Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for
Non-Spherical Gaussian Mixtures [9.670578317106182]
We consider mixtures of $kgeq 2$ Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated.
We show that this kind of hardness can only appear if mixing weights are allowed to be exponentially small.
We develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight.
arXiv Detail & Related papers (2021-12-10T10:51:44Z) - Improved Convergence Guarantees for Learning Gaussian Mixture Models by
EM and Gradient EM [15.251738476719918]
We consider the problem of estimating the parameters a Gaussian Mixture Model with K components of known weights.
We present a sharper analysis of the local convergence of EM and gradient EM, compared to previous works.
Our second contribution are improved sample size requirements for accurate estimation by EM and gradient EM.
arXiv Detail & Related papers (2021-01-03T08:10:01Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs)
Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z) - Algebraic and Analytic Approaches for Parameter Learning in Mixture
Models [66.96778152993858]
We present two different approaches for parameter learning in several mixture models in one dimension.
For some of these distributions, our results represent the first guarantees for parameter estimation.
arXiv Detail & Related papers (2020-01-19T05:10:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.