Privately Estimating a Gaussian: Efficient, Robust and Optimal
- URL: http://arxiv.org/abs/2212.08018v2
- Date: Thu, 1 Jun 2023 16:38:58 GMT
- Title: Privately Estimating a Gaussian: Efficient, Robust and Optimal
- Authors: Daniel Alabi, Pravesh K. Kothari, Pranay Tankala, Prayaag Venkat, Fred
Zhang
- Abstract summary: We give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models.
In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error.
For the special case of mean estimation, our algorithm achieves the optimal sample complexity of $widetilde O(d)$, improving on a $widetilde O(d1.5)$ bound from prior work.
- Score: 6.901744415870126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we give efficient algorithms for privately estimating a
Gaussian distribution in both pure and approximate differential privacy (DP)
models with optimal dependence on the dimension in the sample complexity. In
the pure DP setting, we give an efficient algorithm that estimates an unknown
$d$-dimensional Gaussian distribution up to an arbitrary tiny total variation
error using $\widetilde{O}(d^2 \log \kappa)$ samples while tolerating a
constant fraction of adversarial outliers. Here, $\kappa$ is the condition
number of the target covariance matrix. The sample bound matches best
non-private estimators in the dependence on the dimension (up to a
polylogarithmic factor). We prove a new lower bound on differentially private
covariance estimation to show that the dependence on the condition number
$\kappa$ in the above sample bound is also tight. Prior to our work, only
identifiability results (yielding inefficient super-polynomial time algorithms)
were known for the problem. In the approximate DP setting, we give an efficient
algorithm to estimate an unknown Gaussian distribution up to an arbitrarily
tiny total variation error using $\widetilde{O}(d^2)$ samples while tolerating
a constant fraction of adversarial outliers. Prior to our work, all efficient
approximate DP algorithms incurred a super-quadratic sample cost or were not
outlier-robust. For the special case of mean estimation, our algorithm achieves
the optimal sample complexity of $\widetilde O(d)$, improving on a $\widetilde
O(d^{1.5})$ bound from prior work. Our pure DP algorithm relies on a recursive
private preconditioning subroutine that utilizes the recent work on private
mean estimation [Hopkins et al., 2022]. Our approximate DP algorithms are based
on a substantial upgrade of the method of stabilizing convex relaxations
introduced in [Kothari et al., 2022].
Related papers
- Private Mean Estimation with Person-Level Differential Privacy [6.621676316292624]
We study person-level differentially private mean estimation in the case where each person holds multiple samples.
We give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP.
arXiv Detail & Related papers (2024-05-30T18:20:35Z) - Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination [42.526664955704746]
We study sparse estimation tasks in Huber's contamination model with a focus on mean estimation, PCA, and linear regression.
For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees.
At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.
arXiv Detail & Related papers (2024-03-15T15:51:27Z) - Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy [23.12198546384976]
Posterior sampling provides $varepsilon$-pure differential privacy guarantees.
It does not suffer from potentially unbounded privacy breach introduced by $(varepsilon,delta)$-approximate DP.
In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo.
arXiv Detail & Related papers (2023-10-23T07:54:39Z) - Some Constructions of Private, Efficient, and Optimal $K$-Norm and Elliptic Gaussian Noise [54.34628844260993]
Differentially private computation often begins with a bound on some $d$-dimensional statistic's sensitivity.
For pure differential privacy, the $K$-norm mechanism can improve on this approach using a norm tailored to the statistic's sensitivity space.
This paper solves both problems for the simple statistics of sum, count, and vote.
arXiv Detail & Related papers (2023-09-27T17:09:36Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
We study the problem of high-dimensional sparse mean estimation in the presence of an $epsilon$-fraction of adversarial outliers.
Our algorithms follow the Sum-of-Squares based, to algorithms approach.
arXiv Detail & Related papers (2022-06-07T16:49:54Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.