DP-PCA: Statistically Optimal and Differentially Private PCA
- URL: http://arxiv.org/abs/2205.13709v1
- Date: Fri, 27 May 2022 02:02:17 GMT
- Title: DP-PCA: Statistically Optimal and Differentially Private PCA
- Authors: Xiyang Liu, Weihao Kong, Prateek Jain, Sewoong Oh
- Abstract summary: DP-PCA is a single-pass algorithm that overcomes both limitations.
For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=tilde O(d)$.
- Score: 44.22319983246645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the canonical statistical task of computing the principal component
from $n$ i.i.d.~data in $d$ dimensions under
$(\varepsilon,\delta)$-differential privacy. Although extensively studied in
literature, existing solutions fall short on two key aspects: ($i$) even for
Gaussian data, existing private algorithms require the number of samples $n$ to
scale super-linearly with $d$, i.e., $n=\Omega(d^{3/2})$, to obtain non-trivial
results while non-private PCA requires only $n=O(d)$, and ($ii$) existing
techniques suffer from a non-vanishing error even when the randomness in each
data point is arbitrarily small. We propose DP-PCA, which is a single-pass
algorithm that overcomes both limitations. It is based on a private minibatch
gradient ascent method that relies on {\em private mean estimation}, which adds
minimal noise required to ensure privacy by adapting to the variance of a given
minibatch of gradients. For sub-Gaussian data, we provide nearly optimal
statistical error rates even for $n=\tilde O(d)$. Furthermore, we provide a
lower bound showing that sub-Gaussian style assumption is necessary in
obtaining the optimal error rate.
Related papers
- Private Mean Estimation with Person-Level Differential Privacy [6.621676316292624]
We study person-level differentially private mean estimation in the case where each person holds multiple samples.
We give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP.
arXiv Detail & Related papers (2024-05-30T18:20:35Z) - Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression [4.396860522241307]
We show that an efficient learning algorithm for sparse linear regression can be used to solve sparse PCA problems with a negative spike.
We complement our reduction with low-degree and statistical query lower bounds for the sparse problems from which we reduce.
arXiv Detail & Related papers (2024-02-21T19:55:01Z) - PLAN: Variance-Aware Private Mean Estimation [12.452663855242344]
Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning.
We show how to exploit skew in the vector $boldsymbolsigma$, obtaining a (zero-digma) differentially private mean estimate with $ell$ error.
To verify the effectiveness of PLAN, we empirically evaluate accuracy on both synthetic and real world data.
arXiv Detail & Related papers (2023-06-14T21:04:50Z) - Privately Estimating a Gaussian: Efficient, Robust and Optimal [6.901744415870126]
We give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models.
In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error.
For the special case of mean estimation, our algorithm achieves the optimal sample complexity of $widetilde O(d)$, improving on a $widetilde O(d1.5)$ bound from prior work.
arXiv Detail & Related papers (2022-12-15T18:27:39Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - (Nearly) Optimal Private Linear Regression via Adaptive Clipping [22.639650869444395]
We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution.
We propose and analyze a one-pass mini-batch gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement.
arXiv Detail & Related papers (2022-07-11T08:04:46Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.