Related papers: Learning Mixtures of Spherical Gaussians via Fourier Analysis

Learning Mixtures of Spherical Gaussians via Fourier Analysis

URL: http://arxiv.org/abs/2004.05813v2
Date: Sat, 11 Jul 2020 09:08:11 GMT
Title: Learning Mixtures of Spherical Gaussians via Fourier Analysis
Authors: Somnath Chakraborty, Hariharan Narayanan
Abstract summary: We find that a bound on the sample and computational complexity was previously unknown when $omega(1) leq d leq O(log k)$. These authors also show that the sample of complexity of a random mixture of gaussians in a ball of radius $d$ in $d$ dimensions, when $d$ is $Theta(sqrtd)$ in $d$ dimensions, when $d$ is at least $poly(k, frac1delta)$.
Score: 0.5381004207943596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Suppose that we are given independent, identically distributed samples $x_l$ from a mixture $\mu$ of no more than $k$ of $d$-dimensional spherical gaussian distributions $\mu_i$ with variance $1$, such that the minimum $\ell_2$ distance between two distinct centers $y_l$ and $y_j$ is greater than $\sqrt{d} \Delta$ for some $c \leq \Delta $, where $c\in (0,1)$ is a small positive universal constant. We develop a randomized algorithm that learns the centers $y_l$ of the gaussians, to within an $\ell_2$ distance of $\delta < \frac{\Delta\sqrt{d}}{2}$ and the weights $w_l$ to within $cw_{min}$ with probability greater than $1 - \exp(-k/c)$. The number of samples and the computational time is bounded above by $poly(k, d, \frac{1}{\delta})$. Such a bound on the sample and computational complexity was previously unknown when $\omega(1) \leq d \leq O(\log k)$. When $d = O(1)$, this follows from work of Regev and Vijayaraghavan. These authors also show that the sample complexity of learning a random mixture of gaussians in a ball of radius $\Theta(\sqrt{d})$ in $d$ dimensions, when $d$ is $\Theta( \log k)$ is at least $poly(k, \frac{1}{\delta})$, showing that our result is tight in this case.

Related papers

Learning High-dimensional Gaussians from Censored Data [24.64312908684256]
We provide efficient algorithms for the problem of distribution learning from high-dimensional Gaussian data where in each sample, some of the variable values are missing. The missingness model, denoted by $S(y)$, is the function that maps any point $y$ in $Rd$ to the subsets of its coordinates that are seen. We design an algorithm that learns $N(mu*, Sigma*)$ up to total variation (TV) distance epsilon, using $poly(d, 1/epsilon)$ samples.
arXiv Detail & Related papers (2025-04-28T03:22:01Z)
Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior [28.71736321665378]
We study the nonparametric maximum likelihood estimator $widehatpi$ for Gaussian location mixtures in one dimension. We provide an algorithm which for small enough $varepsilon>0$ computes an $varepsilon$-approximation of $widehatpi$ in Wasserstein distance in time. We also show the distribution of $widehatpi$ conditioned to be $k$-atomic admits a density on the associated $2k-1$ dimensional parameter space.
arXiv Detail & Related papers (2025-03-26T03:36:36Z)
Simple and Nearly-Optimal Sampling for Rank-1 Tensor Completion via Gauss-Jordan [49.1574468325115]
We revisit the sample and computational complexity of completing a rank-1 tensor in $otimes_i=1N mathbbRd$. We present a characterization of the problem which admits an algorithm amounting to Gauss-Jordan on a pair of random linear systems.
arXiv Detail & Related papers (2024-08-10T04:26:19Z)
Outlier Robust Multivariate Polynomial Regression [27.03423421704806]
We are given a set of random samples $(mathbfx_i,y_i) in [-1,1]n times mathbbR$ that are noisy versions of $(mathbfx_i,p(mathbfx_i)$. The goal is to output a $hatp$, within an $ell_in$-distance of at most $O(sigma)$ from $p$.
arXiv Detail & Related papers (2024-03-14T15:04:45Z)
$\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits. For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z)
A Fourier Approach to Mixture Learning [46.995354373649675]
We give an algorithm that efficiently learns the means in $d = O(log k/loglog k)$ dimensions under separation $d/sqrtlog k (modulo factors) Our results can be easily extended to learning mixtures of non-Gaussian distributions, under a mild condition on the Fourier spectrum of the distribution.
arXiv Detail & Related papers (2022-10-05T17:35:46Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
The EM Algorithm is Adaptively-Optimal for Unbalanced Symmetric Gaussian Mixtures [36.91281862322494]
We show that the EM algorithm adaptively achieves the minimax error rate $tildeOBig(minBigfrac1(1-2delta_*)sqrtfracdn,frac1|theta_*|sqrtfracdn,left(fracdnright)1/4BigBig)$ in no more than $OBig(frac1|theta_*|2Big
arXiv Detail & Related papers (2021-03-29T14:28:17Z)
The Sparse Hausdorff Moment Problem, with Application to Topic Models [5.151973524974052]
We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables. It suffices to know the moments to additive accuracy $w_mincdotzetaO(k)$.
arXiv Detail & Related papers (2020-07-16T04:23:57Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging. We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
Sample Amplification: Increasing Dataset Size even when Learning is Impossible [15.864702679819544]
Given data drawn from an unknown distribution, $D$, to what extent is it possible to amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $left(n, n + Theta(fracnsqrtk)right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $Theta(d)$ samples.
arXiv Detail & Related papers (2019-04-26T21:42:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.