A Fourier Approach to Mixture Learning
- URL: http://arxiv.org/abs/2210.02415v2
- Date: Thu, 6 Oct 2022 03:43:05 GMT
- Title: A Fourier Approach to Mixture Learning
- Authors: Mingda Qiao, Guru Guruganesh, Ankit Singh Rawat, Avinava Dubey, Manzil
Zaheer
- Abstract summary: We give an algorithm that efficiently learns the means in $d = O(log k/loglog k)$ dimensions under separation $d/sqrtlog k (modulo factors)
Our results can be easily extended to learning mixtures of non-Gaussian distributions, under a mild condition on the Fourier spectrum of the distribution.
- Score: 46.995354373649675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the problem of learning mixtures of spherical Gaussians. Given
samples from mixture $\frac{1}{k}\sum_{j=1}^{k}\mathcal{N}(\mu_j, I_d)$, the
goal is to estimate the means $\mu_1, \mu_2, \ldots, \mu_k \in \mathbb{R}^d$ up
to a small error. The hardness of this learning problem can be measured by the
separation $\Delta$ defined as the minimum distance between all pairs of means.
Regev and Vijayaraghavan (2017) showed that with $\Delta = \Omega(\sqrt{\log
k})$ separation, the means can be learned using $\mathrm{poly}(k, d)$ samples,
whereas super-polynomially many samples are required if $\Delta = o(\sqrt{\log
k})$ and $d = \Omega(\log k)$. This leaves open the low-dimensional regime
where $d = o(\log k)$.
In this work, we give an algorithm that efficiently learns the means in $d =
O(\log k/\log\log k)$ dimensions under separation $d/\sqrt{\log k}$ (modulo
doubly logarithmic factors). This separation is strictly smaller than
$\sqrt{\log k}$, and is also shown to be necessary. Along with the results of
Regev and Vijayaraghavan (2017), our work almost pins down the critical
separation threshold at which efficient parameter learning becomes possible for
spherical Gaussian mixtures. More generally, our algorithm runs in time
$\mathrm{poly}(k)\cdot f(d, \Delta, \epsilon)$, and is thus fixed-parameter
tractable in parameters $d$, $\Delta$ and $\epsilon$.
Our approach is based on estimating the Fourier transform of the mixture at
carefully chosen frequencies, and both the algorithm and its analysis are
simple and elementary. Our positive results can be easily extended to learning
mixtures of non-Gaussian distributions, under a mild condition on the Fourier
spectrum of the distribution.
Related papers
- Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector.
We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z) - Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - SQ Lower Bounds for Learning Mixtures of Linear Classifiers [43.63696593768504]
We show that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures.
The key technical ingredient is a new construction of spherical designs that may be of independent interest.
arXiv Detail & Related papers (2023-10-18T10:56:57Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures [9.053430799456587]
We study the problem of learning nonparametric distributions in a finite mixture.
We establish tight bounds on the sample complexity for learning the component distributions in such models.
arXiv Detail & Related papers (2022-03-28T23:53:48Z) - Multimeasurement Generative Models [7.502947376736449]
We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
arXiv Detail & Related papers (2021-12-18T02:11:36Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Learning Mixtures of Spherical Gaussians via Fourier Analysis [0.5381004207943596]
We find that a bound on the sample and computational complexity was previously unknown when $omega(1) leq d leq O(log k)$.
These authors also show that the sample of complexity of a random mixture of gaussians in a ball of radius $d$ in $d$ dimensions, when $d$ is $Theta(sqrtd)$ in $d$ dimensions, when $d$ is at least $poly(k, frac1delta)$.
arXiv Detail & Related papers (2020-04-13T08:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.