Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance
Estimation for Subgaussian Distributions
- URL: http://arxiv.org/abs/2301.12250v2
- Date: Tue, 25 Apr 2023 19:40:02 GMT
- Title: Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance
Estimation for Subgaussian Distributions
- Authors: Gavin Brown, Samuel B. Hopkins and Adam Smith
- Abstract summary: We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation.
Our algorithm produces $tildemu$ such that $|mu|_Sigma leq alpha$ as long as $n gtrsim tfrac d alpha2 + tfracd sqrtlog 1/deltaalpha varepsilon+fracdlog 1/deltavarepsilon$.
- Score: 8.40077201352607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a fast, differentially private algorithm for high-dimensional
covariance-aware mean estimation with nearly optimal sample complexity. Only
exponential-time estimators were previously known to achieve this guarantee.
Given $n$ samples from a (sub-)Gaussian distribution with unknown mean $\mu$
and covariance $\Sigma$, our $(\varepsilon,\delta)$-differentially private
estimator produces $\tilde{\mu}$ such that $\|\mu - \tilde{\mu}\|_{\Sigma} \leq
\alpha$ as long as $n \gtrsim \tfrac d {\alpha^2} + \tfrac{d \sqrt{\log
1/\delta}}{\alpha \varepsilon}+\frac{d\log 1/\delta}{\varepsilon}$. The
Mahalanobis error metric $\|\mu - \hat{\mu}\|_{\Sigma}$ measures the distance
between $\hat \mu$ and $\mu$ relative to $\Sigma$; it characterizes the error
of the sample mean. Our algorithm runs in time $\tilde{O}(nd^{\omega - 1} +
nd/\varepsilon)$, where $\omega < 2.38$ is the matrix multiplication exponent.
We adapt an exponential-time approach of Brown, Gaboardi, Smith, Ullman, and
Zakynthinou (2021), giving efficient variants of stable mean and covariance
estimation subroutines that also improve the sample complexity to the nearly
optimal bound above.
Our stable covariance estimator can be turned to private covariance
estimation for unrestricted subgaussian distributions. With $n\gtrsim d^{3/2}$
samples, our estimate is accurate in spectral norm. This is the first such
algorithm using $n= o(d^2)$ samples, answering an open question posed by Alabi
et al. (2022). With $n\gtrsim d^2$ samples, our estimate is accurate in
Frobenius norm. This leads to a fast, nearly optimal algorithm for private
learning of unrestricted Gaussian distributions in TV distance.
Duchi, Haque, and Kuditipudi (2023) obtained similar results independently
and concurrently.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Better and Simpler Lower Bounds for Differentially Private Statistical
Estimation [7.693388437377614]
We prove that for any $alpha le O(1)$, estimating the covariance of a Gaussian up to spectral error $alpha$ requires $tildeOmegaleft(fracd3/2alpha varepsilon + fracdalpha2right)$ samples.
Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $tildeOmegaleft(fracdalphak/(k-1) varepsilon +
arXiv Detail & Related papers (2023-10-10T04:02:43Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - A Fast Algorithm for Adaptive Private Mean Estimation [5.090363690988394]
We design an $(varepsilon, delta)$-differentially private algorithm that is adaptive to $Sigma$.
The estimator achieves optimal rates of convergence with respect to the induced Mahalanobis norm $||cdot||_Sigma$.
arXiv Detail & Related papers (2023-01-17T18:44:41Z) - Private High-Dimensional Hypothesis Testing [4.133655523622441]
We provide improved differentially private algorithms for identity testing of high-dimensional distributions.
Specifically, we can test whether the distribution comes from $mathcalN(mu*, Sigma)$ for some fixed $mu*$ or from some $mathcalN(mu*, Sigma)$ with total variation distance at least $alpha$.
arXiv Detail & Related papers (2022-03-03T06:25:48Z) - Covariance-Aware Private Mean Estimation Without Private Covariance Estimation [10.036088581191592]
We present two sample-efficient differentially private mean estimators for $d$-dimensional (sub)Gaussian distributions.
Our estimators output $tildemu$ such that $| tildemu - mu |_Sigma leq alpha$, where $| cdot |_Sigma$ is the Mahalanobis distance.
arXiv Detail & Related papers (2021-06-24T21:40:07Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.