Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for
Non-Spherical Gaussian Mixtures
- URL: http://arxiv.org/abs/2112.05445v2
- Date: Wed, 7 Jun 2023 18:04:04 GMT
- Title: Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for
Non-Spherical Gaussian Mixtures
- Authors: Rares-Darius Buhai, David Steurer
- Abstract summary: We consider mixtures of $kgeq 2$ Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated.
We show that this kind of hardness can only appear if mixing weights are allowed to be exponentially small.
We develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight.
- Score: 9.670578317106182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider mixtures of $k\geq 2$ Gaussian components with unknown means and
unknown covariance (identical for all components) that are well-separated,
i.e., distinct components have statistical overlap at most $k^{-C}$ for a large
enough constant $C\ge 1$. Previous statistical-query [DKS17] and lattice-based
[BRST21, GVV22] lower bounds give formal evidence that even distinguishing such
mixtures from (pure) Gaussians may be exponentially hard (in $k$).
We show that this kind of hardness can only appear if mixing weights are
allowed to be exponentially small, and that for polynomially lower bounded
mixing weights non-trivial algorithmic guarantees are possible in
quasi-polynomial time. Concretely, we develop an algorithm based on the
sum-of-squares method with running time quasi-polynomial in the minimum mixing
weight. The algorithm can reliably distinguish between a mixture of $k\ge 2$
well-separated Gaussian components and a (pure) Gaussian distribution. As a
certificate, the algorithm computes a bipartition of the input sample that
separates a pair of mixture components, i.e., both sides of the bipartition
contain most of the sample points of at least one component.
For the special case of colinear means, our algorithm outputs a
$k$-clustering of the input sample that is approximately consistent with the
components of the mixture. We obtain similar clustering guarantees also for the
case that the overlap between any two mixture components is lower bounded
quasi-polynomially in $k$ (in addition to being upper bounded polynomially in
$k$).
A key technical ingredient is a characterization of separating directions for
well-separated Gaussian components in terms of ratios of polynomials that
correspond to moments of two carefully chosen orders logarithmic in the minimum
mixing weight.
Related papers
- Learning general Gaussian mixtures with efficient score matching [16.06356123715737]
We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions.
We make no separation assumptions on the underlying mixture components.
We give an algorithm that draws $dmathrmpoly(k/varepsilon)$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler.
arXiv Detail & Related papers (2024-04-29T17:30:36Z) - Clustering Mixtures of Bounded Covariance Distributions Under Optimal
Separation [44.25945344950543]
We study the clustering problem for mixtures of bounded covariance distributions.
We give the first poly-time algorithm for this clustering task.
Our algorithm is robust to $Omega(alpha)$-fraction of adversarial outliers.
arXiv Detail & Related papers (2023-12-19T01:01:53Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Robustly Learning Mixtures of $k$ Arbitrary Gaussians [47.40835932474677]
We give a-time algorithm for the problem of robustly estimating a mixture of $k$ arbitrary Gaussians in $mathbbRd$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions.
Our main tools are an efficient emphpartial clustering algorithm that relies on the sum-of-squares method, and a novel tensor decomposition algorithm that allows errors in both Frobenius norm and low-rank terms.
arXiv Detail & Related papers (2020-12-03T17:54:03Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs)
Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z) - The EM Algorithm gives Sample-Optimality for Learning Mixtures of
Well-Separated Gaussians [21.780375994511324]
We prove a new result for the ExpectationMaximization (EM): we show that EM converges locally, under separation $Omega(sqrtlog k)$.
Our results do not assume or use prior knowledge of the (potentially different) Gaussian components.
arXiv Detail & Related papers (2020-02-02T05:09:26Z) - Algebraic and Analytic Approaches for Parameter Learning in Mixture
Models [66.96778152993858]
We present two different approaches for parameter learning in several mixture models in one dimension.
For some of these distributions, our results represent the first guarantees for parameter estimation.
arXiv Detail & Related papers (2020-01-19T05:10:56Z) - Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures.
We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components.
The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.