Statistical Inference for Privatized Data with Unknown Sample Size
- URL: http://arxiv.org/abs/2406.06231v1
- Date: Mon, 10 Jun 2024 13:03:20 GMT
- Title: Statistical Inference for Privatized Data with Unknown Sample Size
- Authors: Jordan Awan, Andres Felipe Barrientos, Nianqiao Ju,
- Abstract summary: We develop both theory and algorithms to analyze privatized data in the unbounded differential privacy(DP)
We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity.
- Score: 7.933465724913661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop both theory and algorithms to analyze privatized data in the unbounded differential privacy(DP), where even the sample size is considered a sensitive quantity that requires privacy protection. We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity, provided that the noise used to privatize $n$ is at an appropriate rate; we also establish that ABC-type posterior distributions converge under similar assumptions. We further give asymptotic results in the regime where the privacy budget for $n$ goes to zero, establishing similarity of sampling distributions as well as showing that the MLE in the unbounded setting converges to the bounded-DP MLE. In order to facilitate valid, finite-sample Bayesian inference on privatized data in the unbounded DP setting, we propose a reversible jump MCMC algorithm which extends the data augmentation MCMC of Ju et al. (2022). We also propose a Monte Carlo EM algorithm to compute the MLE from privatized data in both bounded and unbounded DP. We apply our methodology to analyze a linear regression model as well as a 2019 American Time Use Survey Microdata File which we model using a Dirichlet distribution.
Related papers
- Scalable DP-SGD: Shuffling vs. Poisson Subsampling [61.19794019914523]
We provide new lower bounds on the privacy guarantee of the multi-epoch Adaptive Linear Queries (ABLQ) mechanism with shuffled batch sampling.
We show substantial gaps when compared to Poisson subsampling; prior analysis was limited to a single epoch.
We introduce a practical approach to implement Poisson subsampling at scale using massively parallel computation.
arXiv Detail & Related papers (2024-11-06T19:06:16Z) - Adaptively Private Next-Token Prediction of Large Language Models [13.297381972044558]
We introduce a noisy screening mechanism that filters out queries with potentially expensive privacy loss.
AdaPMixED can reduce the privacy loss by 16x while preserving the utility over the original PMixED.
arXiv Detail & Related papers (2024-10-02T20:34:24Z) - How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy [23.12198546384976]
Posterior sampling provides $varepsilon$-pure differential privacy guarantees.
It does not suffer from potentially unbounded privacy breach introduced by $(varepsilon,delta)$-approximate DP.
In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo.
arXiv Detail & Related papers (2023-10-23T07:54:39Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - A Bias-Accuracy-Privacy Trilemma for Statistical Estimation [16.365507345447803]
We show that no algorithm can simultaneously have low bias, low error, and low privacy loss for arbitrary distributions.
We show that unbiased mean estimation is possible under a more permissive notion of differential privacy.
arXiv Detail & Related papers (2023-01-30T23:40:20Z) - The Poisson binomial mechanism for secure and private federated learning [19.399122892615573]
We introduce a discrete differential privacy mechanism for distributed mean estimation (DME) with applications to federated learning and analytics.
We provide a tight analysis of its privacy guarantees, showing that it achieves the same privacy-accuracy trade-offs as the continuous Gaussian mechanism.
arXiv Detail & Related papers (2022-07-09T05:46:28Z) - Optimal Membership Inference Bounds for Adaptive Composition of Sampled
Gaussian Mechanisms [93.44378960676897]
Given a trained model and a data sample, membership-inference (MI) attacks predict whether the sample was in the model's training set.
A common countermeasure against MI attacks is to utilize differential privacy (DP) during model training to mask the presence of individual examples.
In this paper, we derive bounds for the textitadvantage of an adversary mounting a MI attack, and demonstrate tightness for the widely-used Gaussian mechanism.
arXiv Detail & Related papers (2022-04-12T22:36:56Z) - Nonparametric extensions of randomized response for private confidence sets [51.75485869914048]
This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP)
We present confidence intervals (CI) and time-uniform confidence sequences (CS) for $mustar$ when only given access to the privatized data.
arXiv Detail & Related papers (2022-02-17T16:04:49Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.