Public-data Assisted Private Stochastic Optimization: Power and
Limitations
- URL: http://arxiv.org/abs/2403.03856v1
- Date: Wed, 6 Mar 2024 17:06:11 GMT
- Title: Public-data Assisted Private Stochastic Optimization: Power and
Limitations
- Authors: Enayat Ullah, Michael Menart, Raef Bassily, Crist\'obal Guzm\'an,
Raman Arora
- Abstract summary: We study the limits and capability of public-data assisted differentially private (PA-DP) algorithms.
For complete/labeled public data, we show that any $tildeOmegabig(minbigfrac1sqrtn+fracsqrtdnepsilon big big)$ has excess risk.
We also study PA-DP supervised learning with textitunlabeled public samples.
- Score: 30.298342283075172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the limits and capability of public-data assisted differentially
private (PA-DP) algorithms. Specifically, we focus on the problem of stochastic
convex optimization (SCO) with either labeled or unlabeled public data. For
complete/labeled public data, we show that any $(\epsilon,\delta)$-PA-DP has
excess risk
$\tilde{\Omega}\big(\min\big\{\frac{1}{\sqrt{n_{\text{pub}}}},\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\epsilon}
\big\} \big)$, where $d$ is the dimension, ${n_{\text{pub}}}$ is the number of
public samples, ${n_{\text{priv}}}$ is the number of private samples, and
$n={n_{\text{pub}}}+{n_{\text{priv}}}$. These lower bounds are established via
our new lower bounds for PA-DP mean estimation, which are of a similar form. Up
to constant factors, these lower bounds show that the simple strategy of either
treating all data as private or discarding the private data, is optimal. We
also study PA-DP supervised learning with \textit{unlabeled} public samples. In
contrast to our previous result, we here show novel methods for leveraging
public data in private supervised learning. For generalized linear models (GLM)
with unlabeled public data, we show an efficient algorithm which, given
$\tilde{O}({n_{\text{priv}}}\epsilon)$ unlabeled public samples, achieves the
dimension independent rate $\tilde{O}\big(\frac{1}{\sqrt{{n_{\text{priv}}}}} +
\frac{1}{\sqrt{{n_{\text{priv}}}\epsilon}}\big)$. We develop new lower bounds
for this setting which shows that this rate cannot be improved with more public
samples, and any fewer public samples leads to a worse rate. Finally, we
provide extensions of this result to general hypothesis classes with finite
fat-shattering dimension with applications to neural networks and non-Euclidean
geometries.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved.
This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z) - On Computing Pairwise Statistics with Local Differential Privacy [55.81991984375959]
We study the problem of computing pairwise statistics, i.e., ones of the form $binomn2-1 sum_i ne j f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model.
This formulation captures important metrics such as Kendall's $tau$ coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc.
arXiv Detail & Related papers (2024-06-24T04:06:09Z) - Private Mean Estimation with Person-Level Differential Privacy [6.621676316292624]
We study person-level differentially private mean estimation in the case where each person holds multiple samples.
We give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP.
arXiv Detail & Related papers (2024-05-30T18:20:35Z) - Improved Analysis of Sparse Linear Regression in Local Differential
Privacy Model [38.66115499136791]
We revisit the problem of sparse linear regression in the local differential privacy (LDP) model.
We propose an innovative NLDP algorithm, the very first of its kind for the problem.
Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.
arXiv Detail & Related papers (2023-10-11T10:34:52Z) - PLAN: Variance-Aware Private Mean Estimation [12.452663855242344]
Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning.
We show how to exploit skew in the vector $boldsymbolsigma$, obtaining a (zero-digma) differentially private mean estimate with $ell$ error.
To verify the effectiveness of PLAN, we empirically evaluate accuracy on both synthetic and real world data.
arXiv Detail & Related papers (2023-06-14T21:04:50Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - DP-PCA: Statistically Optimal and Differentially Private PCA [44.22319983246645]
DP-PCA is a single-pass algorithm that overcomes both limitations.
For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=tilde O(d)$.
arXiv Detail & Related papers (2022-05-27T02:02:17Z) - Private Query Release Assisted by Public Data [96.6174729958211]
We study the problem of differentially private query release assisted by access to public data.
We show that we can solve the problem for any query class $mathcalH$ of finite VC-dimension using only $d/alpha$ public samples and $sqrtpd3/2/alpha2$ private samples.
arXiv Detail & Related papers (2020-04-23T02:46:37Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.