Robustly Learning any Clusterable Mixture of Gaussians
- URL: http://arxiv.org/abs/2005.06417v1
- Date: Wed, 13 May 2020 16:44:12 GMT
- Title: Robustly Learning any Clusterable Mixture of Gaussians
- Authors: Ilias Diakonikolas, Samuel B. Hopkins, Daniel Kane, Sushrut Karmalkar
- Abstract summary: We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
- Score: 55.41573600814391
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the efficient learnability of high-dimensional Gaussian mixtures in
the outlier-robust setting, where a small constant fraction of the data is
adversarially corrupted. We resolve the polynomial learnability of this problem
when the components are pairwise separated in total variation distance.
Specifically, we provide an algorithm that, for any constant number of
components $k$, runs in polynomial time and learns the components of an
$\epsilon$-corrupted $k$-mixture within information theoretically near-optimal
error of $\tilde{O}(\epsilon)$, under the assumption that the overlap between
any pair of components $P_i, P_j$ (i.e., the quantity $1-TV(P_i, P_j)$) is
bounded by $\mathrm{poly}(\epsilon)$.
Our separation condition is the qualitatively weakest assumption under which
accurate clustering of the samples is possible. In particular, it allows for
components with arbitrary covariances and for components with identical means,
as long as their covariances differ sufficiently. Ours is the first polynomial
time algorithm for this problem, even for $k=2$.
Our algorithm follows the Sum-of-Squares based proofs to algorithms approach.
Our main technical contribution is a new robust identifiability proof of
clusters from a Gaussian mixture, which can be captured by the constant-degree
Sum of Squares proof system. The key ingredients of this proof are a novel use
of SoS-certifiable anti-concentration and a new characterization of pairs of
Gaussians with small (dimension-independent) overlap in terms of their
parameter distance.
Related papers
- Learning general Gaussian mixtures with efficient score matching [16.06356123715737]
We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions.
We make no separation assumptions on the underlying mixture components.
We give an algorithm that draws $dmathrmpoly(k/varepsilon)$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler.
arXiv Detail & Related papers (2024-04-29T17:30:36Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures [9.053430799456587]
We study the problem of learning nonparametric distributions in a finite mixture.
We establish tight bounds on the sample complexity for learning the component distributions in such models.
arXiv Detail & Related papers (2022-03-28T23:53:48Z) - Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for
Non-Spherical Gaussian Mixtures [9.670578317106182]
We consider mixtures of $kgeq 2$ Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated.
We show that this kind of hardness can only appear if mixing weights are allowed to be exponentially small.
We develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight.
arXiv Detail & Related papers (2021-12-10T10:51:44Z) - Clustering Mixtures with Almost Optimal Separation in Polynomial Time [21.509452343745252]
We consider the problem of clustering mixtures of mean-separated Gaussians in high dimensions.
It is folklore that separation $Delta = Theta (sqrtlog k)$ is both necessary and sufficient to recover a good clustering.
We give an algorithm which takes many samples and time, and which can successfully recover a good clustering.
arXiv Detail & Related papers (2021-12-01T18:34:09Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs)
Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.