Gaussian Differential Private Bootstrap by Subsampling
- URL: http://arxiv.org/abs/2505.01197v1
- Date: Fri, 02 May 2025 11:40:50 GMT
- Title: Gaussian Differential Private Bootstrap by Subsampling
- Authors: Holger Dette, Carina Graw,
- Abstract summary: We propose a private empirical $m$ out of $n$ bootstrap and validate its consistency and privacy guarantees under Differential Privacy.<n>Compared to the the private $n$ out of $n$ bootstrap, our approach has several advantages. First, it comes with less computational costs, in particular for massive data.
- Score: 1.0742675209112622
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Bootstrap is a common tool for quantifying uncertainty in data analysis. However, besides additional computational costs in the application of the bootstrap on massive data, a challenging problem in bootstrap based inference under Differential Privacy consists in the fact that it requires repeated access to the data. As a consequence, bootstrap based differentially private inference requires a significant increase of the privacy budget, which on the other hand comes with a substantial loss in statistical accuracy. A potential solution to reconcile the conflicting goals of statistical accuracy and privacy is to analyze the data under parametric model assumptions and in the last decade, several parametric bootstrap methods for inference under privacy have been investigated. However, uncertainty quantification by parametric bootstrap is only valid if the the quantities of interest can be identified as the parameters of a statistical model and the imposed model assumptions are (at least approximately) satisfied. An alternative to parametric methods is the empirical bootstrap that is a widely used tool for non-parametric inference and well studied in the non-private regime. However, under privacy, less insight is available. In this paper, we propose a private empirical $m$ out of $n$ bootstrap and validate its consistency and privacy guarantees under Gaussian Differential Privacy. Compared to the the private $n$ out of $n$ bootstrap, our approach has several advantages. First, it comes with less computational costs, in particular for massive data. Second, the proposed procedure needs less additional noise in the bootstrap iterations, which leads to an improved statistical accuracy while asymptotically guaranteeing the same level of privacy. Third, we demonstrate much better finite sample properties compared to the currently available procedures.
Related papers
- Optimal Debiased Inference on Privatized Data via Indirect Estimation and Parametric Bootstrap [12.65121513620053]
Existing usage of the parametric bootstrap on privatized data ignored or avoided handling the effect of clamping.<n>We propose using the indirect inference method to estimate the parameter values consistently.<n>Our framework produces confidence intervals with well-calibrated coverage and performs hypothesis testing with the correct type I error.
arXiv Detail & Related papers (2025-07-14T19:12:16Z) - Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.<n>We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent [1.0742675209112622]
Gradient Descent (SGD) is a widely used tool in machine learning.
Uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors.
We propose a novel block bootstrap for SGD under local differential privacy.
arXiv Detail & Related papers (2024-05-21T07:47:21Z) - Resampling methods for private statistical inference [1.8110941972682346]
We consider the task of constructing confidence intervals with differential privacy.
We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data.
For a fixed differential privacy parameter $epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$.
arXiv Detail & Related papers (2024-02-11T08:59:02Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Robust and Differentially Private Mean Estimation [40.323756738056616]
Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices.
An increasing number of such databases consist of data from multiple sources, not all of which can be trusted.
This leaves existing private analyses vulnerable to attacks by an adversary who injects corrupted data.
arXiv Detail & Related papers (2021-02-18T05:02:49Z) - On the Intrinsic Differential Privacy of Bagging [69.70602220716718]
We show that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.
Our experimental results demonstrate that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.
arXiv Detail & Related papers (2020-08-22T14:17:55Z) - Parametric Bootstrap for Differentially Private Confidence Intervals [8.781431682774484]
We develop a practical and general-purpose approach to construct confidence intervals for differentially private parametric estimation.
We find that the parametric bootstrap is a simple and effective solution.
arXiv Detail & Related papers (2020-06-14T00:08:19Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.