Learning Mixtures of Gaussians with Censored Data
- URL: http://arxiv.org/abs/2305.04127v2
- Date: Thu, 29 Jun 2023 00:31:58 GMT
- Title: Learning Mixtures of Gaussians with Censored Data
- Authors: Wai Ming Tai, Bryon Aragam
- Abstract summary: We study the problem of learning mixtures of Gaussians with censored data.
We propose an algorithm that takes only $frac1varepsilonO(k)$ samples to estimate the weights $w_i$ and the means $mu_i$ within $varepsilon$ error.
- Score: 9.053430799456587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of learning mixtures of Gaussians with censored data.
Statistical learning with censored data is a classical problem, with numerous
practical applications, however, finite-sample guarantees for even simple
latent variable models such as Gaussian mixtures are missing. Formally, we are
given censored data from a mixture of univariate Gaussians $$
\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed
only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and
the means $\mu_i$. We propose an algorithm that takes only
$\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the
means $\mu_i$ within $\varepsilon$ error.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples [9.649879910148854]
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP)
Our main result is that $textpoly(k,d,1/alpha,1/varepsilon,log (1/delta))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $mathbbRd$ up to total variation distance $alpha$.
This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs.
arXiv Detail & Related papers (2023-09-07T17:02:32Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Gaussian Mean Testing Made Simple [46.03021473600576]
Given i.i.d. samples from a distribution $p$ on $mathbbRd$, the task is to distinguish, with high probability, between the following cases.
We give an extremely simple algorithm for Gaussian mean testing with a one-page analysis.
arXiv Detail & Related papers (2022-10-25T01:59:13Z) - A Fourier Approach to Mixture Learning [46.995354373649675]
We give an algorithm that efficiently learns the means in $d = O(log k/loglog k)$ dimensions under separation $d/sqrtlog k (modulo factors)
Our results can be easily extended to learning mixtures of non-Gaussian distributions, under a mild condition on the Fourier spectrum of the distribution.
arXiv Detail & Related papers (2022-10-05T17:35:46Z) - Multimeasurement Generative Models [7.502947376736449]
We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
arXiv Detail & Related papers (2021-12-18T02:11:36Z) - Privately Learning Mixtures of Axis-Aligned Gaussians [12.77304082363491]
We show that $widetildeO(k2 d log3/2 (1/delta) / alpha2 varepsilon)$ samples are sufficient to learn a mixture of $k$ axis-aligned Gaussians.
We design a new technique for privately learning mixture distributions.
arXiv Detail & Related papers (2021-06-03T22:34:23Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.