On the Intrinsic Differential Privacy of Bagging
- URL: http://arxiv.org/abs/2008.09845v1
- Date: Sat, 22 Aug 2020 14:17:55 GMT
- Title: On the Intrinsic Differential Privacy of Bagging
- Authors: Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong
- Abstract summary: We show that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.
Our experimental results demonstrate that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.
- Score: 69.70602220716718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially private machine learning trains models while protecting
privacy of the sensitive training data. The key to obtain differentially
private models is to introduce noise/randomness to the training process. In
particular, existing differentially private machine learning methods add noise
to the training data, the gradients, the loss function, and/or the model
itself. Bagging, a popular ensemble learning framework, randomly creates some
subsamples of the training data, trains a base model for each subsample using a
base learner, and takes majority vote among the base models when making
predictions. Bagging has intrinsic randomness in the training process as it
randomly creates subsamples. Our major theoretical results show that such
intrinsic randomness already makes Bagging differentially private without the
needs of additional noise. In particular, we prove that, for any base learner,
Bagging with and without replacement respectively achieves $\left(N\cdot k
\cdot \ln{\frac{n+1}{n}},1- (\frac{n-1}{n})^{N\cdot k}\right)$-differential
privacy and $\left(\ln{\frac{n+1}{n+1-N\cdot k}}, \frac{N\cdot k}{n}
\right)$-differential privacy, where $n$ is the training data size, $k$ is the
subsample size, and $N$ is the number of base models. Moreover, we prove that
if no assumptions about the base learner are made, our derived privacy
guarantees are tight. We empirically evaluate Bagging on MNIST and CIFAR10. Our
experimental results demonstrate that Bagging achieves significantly higher
accuracies than state-of-the-art differentially private machine learning
methods with the same privacy budgets.
Related papers
- Privacy for Free in the Over-Parameterized Regime [19.261178173399784]
Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data.
In this work, we show that in the popular random features model with quadratic loss, for any sufficiently large $p$, privacy can be obtained for free, i.e., $left|R_P right| = o(1)$, not only when the privacy parameter $varepsilon$ has constant order, but also in the strongly private setting $varepsilon = o(1)$.
arXiv Detail & Related papers (2024-10-18T18:01:11Z) - Differentially Private Statistical Inference through $\beta$-Divergence
One Posterior Sampling [2.8544822698499255]
We propose a posterior sampling scheme from a generalised posterior targeting the minimisation of the $beta$-divergence between the model and the data generating process.
This provides private estimation that is generally applicable without requiring changes to the underlying model.
We show that $beta$D-Bayes produces more precise inference estimation for the same privacy guarantees.
arXiv Detail & Related papers (2023-07-11T12:00:15Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Fine-Tuning with Differential Privacy Necessitates an Additional
Hyperparameter Search [38.83524780461911]
We show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy.
We achieve 77.9% accuracy for $(varepsilon, delta)= (2, 10-5)$ on CIFAR-100 for a model pretrained on ImageNet.
arXiv Detail & Related papers (2022-10-05T11:32:49Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.