Optimal Unbiased Randomizers for Regression with Label Differential
Privacy
- URL: http://arxiv.org/abs/2312.05659v1
- Date: Sat, 9 Dec 2023 19:58:34 GMT
- Title: Optimal Unbiased Randomizers for Regression with Label Differential
Privacy
- Authors: Ashwinkumar Badanidiyuru and Badih Ghazi and Pritish Kamath and Ravi
Kumar and Ethan Leeman and Pasin Manurangsi and Avinash V Varadarajan and
Chiyuan Zhang
- Abstract summary: We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP)
We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets.
- Score: 61.63619647307816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new family of label randomizers for training regression models
under the constraint of label differential privacy (DP). In particular, we
leverage the trade-offs between bias and variance to construct better label
randomizers depending on a privately estimated prior distribution over the
labels. We demonstrate that these randomizers achieve state-of-the-art
privacy-utility trade-offs on several datasets, highlighting the importance of
reducing bias when training neural networks with label DP. We also provide
theoretical results shedding light on the structural properties of the optimal
unbiased randomizers.
Related papers
- Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias [5.698050337128548]
Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples.
For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions.
We propose a novel confidence measure, called $mathcalT$-similarity, built upon the prediction diversity of an ensemble of linear classifiers.
arXiv Detail & Related papers (2023-10-23T11:30:06Z) - Pseudo-labeling for Kernel Ridge Regression under Covariate Shift [2.7920304852537536]
We learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution.
We propose to split the labeled data into two subsets and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model.
arXiv Detail & Related papers (2023-02-20T18:46:12Z) - Regression with Label Differential Privacy [64.21020761920322]
We derive a label DP randomization mechanism that is optimal under a given regression loss function.
We prove that the optimal mechanism takes the form of a "randomized response on bins"
arXiv Detail & Related papers (2022-12-12T17:41:32Z) - Partial sequence labeling with structured Gaussian Processes [8.239028141030621]
We propose structured Gaussian Processes for partial sequence labeling.
It encodes uncertainty in the prediction and does not need extra effort for model selection and hyper parameter learning.
It is evaluated on several sequence labeling tasks and the experimental results show the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-20T00:56:49Z) - Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse.
We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z) - Debiased Graph Neural Networks with Agnostic Label Selection Bias [59.61301255860836]
Most existing Graph Neural Networks (GNNs) are proposed without considering the selection bias in data.
We propose a novel Debiased Graph Neural Networks (DGNN) with a differentiated decorrelation regularizer.
Our proposed model outperforms the state-of-the-art methods and DGNN is a flexible framework to enhance existing GNNs.
arXiv Detail & Related papers (2022-01-19T16:50:29Z) - Label differential privacy via clustering [27.485176618438842]
We present new mechanisms for differentially private machine learning that only protects the privacy of the labels in the training set.
Our mechanisms cluster the examples in the training set using their (non-private) feature vectors, randomly re-sample each label from examples in the same cluster, and output a training set with noisy labels as well as a modified version of the true loss function.
We prove that when the clusters are both large and high-quality, the model that minimizes the modified loss on the noisy training set converges to small excess risk at a rate that is comparable to the rate for non-private learning.
arXiv Detail & Related papers (2021-10-05T16:47:27Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.