On Differentially Private U Statistics
- URL: http://arxiv.org/abs/2407.04945v1
- Date: Sat, 6 Jul 2024 03:27:14 GMT
- Title: On Differentially Private U Statistics
- Authors: Kamalika Chaudhuri, Po-Ling Loh, Shourya Pandey, Purnamrita Sarkar,
- Abstract summary: We propose a new thresholding-based approach using emphlocal H'ajek projections to reweight different subsets of the data.
This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.
- Score: 25.683071759227293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of privately estimating a parameter $\mathbb{E}[h(X_1,\dots,X_k)]$, where $X_1$, $X_2$, $\dots$, $X_k$ are i.i.d. data from some distribution and $h$ is a permutation-invariant function. Without privacy constraints, standard estimators are U-statistics, which commonly arise in a wide range of problems, including nonparametric signed rank tests, symmetry testing, uniformity testing, and subgraph counts in random networks, and can be shown to be minimum variance unbiased estimators under mild conditions. Despite the recent outpouring of interest in private mean estimation, privatizing U-statistics has received little attention. While existing private mean estimation algorithms can be applied to obtain confidence intervals, we show that they can lead to suboptimal private error, e.g., constant-factor inflation in the leading term, or even $\Theta(1/n)$ rather than $O(1/n^2)$ in degenerate settings. To remedy this, we propose a new thresholding-based approach using \emph{local H\'ajek projections} to reweight different subsets of the data. This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.
Related papers
- Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares [38.478776450327125]
We present a sample- and time-efficient differentially private algorithm for ordinary least squares.
Our near-optimal accuracy holds for any dataset with the condition number, or exponential time.
arXiv Detail & Related papers (2024-04-23T18:00:38Z) - Better and Simpler Lower Bounds for Differentially Private Statistical
Estimation [7.693388437377614]
We prove that for any $alpha le O(1)$, estimating the covariance of a Gaussian up to spectral error $alpha$ requires $tildeOmegaleft(fracd3/2alpha varepsilon + fracdalpha2right)$ samples.
Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $tildeOmegaleft(fracdalphak/(k-1) varepsilon +
arXiv Detail & Related papers (2023-10-10T04:02:43Z) - Some Constructions of Private, Efficient, and Optimal $K$-Norm and Elliptic Gaussian Noise [54.34628844260993]
Differentially private computation often begins with a bound on some $d$-dimensional statistic's sensitivity.
For pure differential privacy, the $K$-norm mechanism can improve on this approach using a norm tailored to the statistic's sensitivity space.
This paper solves both problems for the simple statistics of sum, count, and vote.
arXiv Detail & Related papers (2023-09-27T17:09:36Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - DP-PCA: Statistically Optimal and Differentially Private PCA [44.22319983246645]
DP-PCA is a single-pass algorithm that overcomes both limitations.
For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=tilde O(d)$.
arXiv Detail & Related papers (2022-05-27T02:02:17Z) - Optimal Algorithms for Mean Estimation under Local Differential Privacy [55.32262879188817]
We show that PrivUnit achieves the optimal variance among a large family of locally private randomizers.
We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees.
arXiv Detail & Related papers (2022-05-05T06:43:46Z) - Nonparametric extensions of randomized response for private confidence sets [51.75485869914048]
This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP)
We present confidence intervals (CI) and time-uniform confidence sequences (CS) for $mustar$ when only given access to the privatized data.
arXiv Detail & Related papers (2022-02-17T16:04:49Z) - Covariance-Aware Private Mean Estimation Without Private Covariance Estimation [10.036088581191592]
We present two sample-efficient differentially private mean estimators for $d$-dimensional (sub)Gaussian distributions.
Our estimators output $tildemu$ such that $| tildemu - mu |_Sigma leq alpha$, where $| cdot |_Sigma$ is the Mahalanobis distance.
arXiv Detail & Related papers (2021-06-24T21:40:07Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Dimension-agnostic inference using cross U-statistics [33.17951971728784]
We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization.
The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks.
arXiv Detail & Related papers (2020-11-10T12:21:34Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.