Privately Learning Mixtures of Axis-Aligned Gaussians
- URL: http://arxiv.org/abs/2106.02162v1
- Date: Thu, 3 Jun 2021 22:34:23 GMT
- Title: Privately Learning Mixtures of Axis-Aligned Gaussians
- Authors: Ishaq Aden-Ali, Hassan Ashtiani, Christopher Liaw
- Abstract summary: We show that $widetildeO(k2 d log3/2 (1/delta) / alpha2 varepsilon)$ samples are sufficient to learn a mixture of $k$ axis-aligned Gaussians.
We design a new technique for privately learning mixture distributions.
- Score: 12.77304082363491
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of learning mixtures of Gaussians under the
constraint of approximate differential privacy. We prove that
$\widetilde{O}(k^2 d \log^{3/2}(1/\delta) / \alpha^2 \varepsilon)$ samples are
sufficient to learn a mixture of $k$ axis-aligned Gaussians in $\mathbb{R}^d$
to within total variation distance $\alpha$ while satisfying $(\varepsilon,
\delta)$-differential privacy. This is the first result for privately learning
mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If
the covariance matrices of each of the Gaussians is the identity matrix, we
show that $\widetilde{O}(kd/\alpha^2 + kd \log(1/\delta) / \alpha \varepsilon)$
samples are sufficient.
Recently, the "local covering" technique of Bun, Kamath, Steinke, and Wu has
been successfully used for privately learning high-dimensional Gaussians with a
known covariance matrix and extended to privately learning general
high-dimensional Gaussians by Aden-Ali, Ashtiani, and Kamath. Given these
positive results, this approach has been proposed as a promising direction for
privately learning mixtures of Gaussians. Unfortunately, we show that this is
not possible.
We design a new technique for privately learning mixture distributions. A
class of distributions $\mathcal{F}$ is said to be list-decodable if there is
an algorithm that, given "heavily corrupted" samples from $f\in \mathcal{F}$,
outputs a list of distributions, $\widehat{\mathcal{F}}$, such that one of the
distributions in $\widehat{\mathcal{F}}$ approximates $f$. We show that if
$\mathcal{F}$ is privately list-decodable, then we can privately learn mixtures
of distributions in $\mathcal{F}$. Finally, we show axis-aligned Gaussian
distributions are privately list-decodable, thereby proving mixtures of such
distributions are privately learnable.
Related papers
- Sample-Efficient Private Learning of Mixtures of Gaussians [11.392841244701605]
We prove that roughly $kd2 + k1.5 d1.75 + k2 d$ samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians.
Our work improves over the previous best result [AAL24b] and is provably optimal when $d$ is much larger than $k2$.
arXiv Detail & Related papers (2024-11-04T17:23:52Z) - Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples [9.649879910148854]
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP)
Our main result is that $textpoly(k,d,1/alpha,1/varepsilon,log (1/delta))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $mathbbRd$ up to total variation distance $alpha$.
This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs.
arXiv Detail & Related papers (2023-09-07T17:02:32Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Learning Mixtures of Gaussians with Censored Data [9.053430799456587]
We study the problem of learning mixtures of Gaussians with censored data.
We propose an algorithm that takes only $frac1varepsilonO(k)$ samples to estimate the weights $w_i$ and the means $mu_i$ within $varepsilon$ error.
arXiv Detail & Related papers (2023-05-06T19:58:02Z) - Robust Mean Estimation Without Moments for Symmetric Distributions [7.105512316884493]
We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently.
We propose a sequence of efficient algorithms that approaches this optimal error.
Our algorithms are based on a generalization of the well-known filtering technique.
arXiv Detail & Related papers (2023-02-21T17:52:23Z) - Private High-Dimensional Hypothesis Testing [4.133655523622441]
We provide improved differentially private algorithms for identity testing of high-dimensional distributions.
Specifically, we can test whether the distribution comes from $mathcalN(mu*, Sigma)$ for some fixed $mu*$ or from some $mathcalN(mu*, Sigma)$ with total variation distance at least $alpha$.
arXiv Detail & Related papers (2022-03-03T06:25:48Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.