Better and Simpler Lower Bounds for Differentially Private Statistical
Estimation
- URL: http://arxiv.org/abs/2310.06289v2
- Date: Thu, 4 Jan 2024 07:36:29 GMT
- Title: Better and Simpler Lower Bounds for Differentially Private Statistical
Estimation
- Authors: Shyam Narayanan
- Abstract summary: We prove that for any $alpha le O(1)$, estimating the covariance of a Gaussian up to spectral error $alpha$ requires $tildeOmegaleft(fracd3/2alpha varepsilon + fracdalpha2right)$ samples.
Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $tildeOmegaleft(fracdalphak/(k-1) varepsilon +
- Score: 7.693388437377614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide optimal lower bounds for two well-known parameter estimation (also
known as statistical estimation) tasks in high dimensions with approximate
differential privacy. First, we prove that for any $\alpha \le O(1)$,
estimating the covariance of a Gaussian up to spectral error $\alpha$ requires
$\tilde{\Omega}\left(\frac{d^{3/2}}{\alpha \varepsilon} +
\frac{d}{\alpha^2}\right)$ samples, which is tight up to logarithmic factors.
This result improves over previous work which established this for $\alpha \le
O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than previous work.
Next, we prove that estimating the mean of a heavy-tailed distribution with
bounded $k$th moments requires $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)}
\varepsilon} + \frac{d}{\alpha^2}\right)$ samples. Previous work for this
problem was only able to establish this lower bound against pure differential
privacy, or in the special case of $k = 2$.
Our techniques follow the method of fingerprinting and are generally quite
simple. Our lower bound for heavy-tailed estimation is based on a black-box
reduction from privately estimating identity-covariance Gaussians. Our lower
bound for covariance estimation utilizes a Bayesian approach to show that,
under an Inverse Wishart prior distribution for the covariance matrix, no
private estimator can be accurate even in expectation, without sufficiently
many samples.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset.
Here we study the complexity of NGCA in the Sum-of-Squares framework.
arXiv Detail & Related papers (2024-10-28T18:19:13Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance
Estimation for Subgaussian Distributions [8.40077201352607]
We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation.
Our algorithm produces $tildemu$ such that $|mu|_Sigma leq alpha$ as long as $n gtrsim tfrac d alpha2 + tfracd sqrtlog 1/deltaalpha varepsilon+fracdlog 1/deltavarepsilon$.
arXiv Detail & Related papers (2023-01-28T16:57:46Z) - New Lower Bounds for Private Estimation and a Generalized Fingerprinting
Lemma [10.176795938619417]
We prove new lower bounds for statistical estimation tasks under the constraint of $(varepsilon, delta)$-differential privacy.
We show that estimating the Frobenius norm requires $Omega(d2)$ samples, and in spectral norm requires $Omega(d3/2)$ samples, both matching upper bounds up to logarithmic factors.
arXiv Detail & Related papers (2022-05-17T17:55:10Z) - Covariance-Aware Private Mean Estimation Without Private Covariance Estimation [10.036088581191592]
We present two sample-efficient differentially private mean estimators for $d$-dimensional (sub)Gaussian distributions.
Our estimators output $tildemu$ such that $| tildemu - mu |_Sigma leq alpha$, where $| cdot |_Sigma$ is the Mahalanobis distance.
arXiv Detail & Related papers (2021-06-24T21:40:07Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Consistent regression when oblivious outliers overwhelm [8.873449722727026]
Prior to our work, even for Gaussian $X$, no estimator for $beta*$ was known to be consistent in this model.
We show that consistent estimation is possible with nearly linear sample size and inverse-polynomial inlier fraction.
The model studied here also captures heavy-tailed noise distributions that may not even have a first moment.
arXiv Detail & Related papers (2020-09-30T16:21:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.