Optimal Unbiased Randomizers for Regression with Label Differential
Privacy
- URL: http://arxiv.org/abs/2312.05659v1
- Date: Sat, 9 Dec 2023 19:58:34 GMT
- Title: Optimal Unbiased Randomizers for Regression with Label Differential
Privacy
- Authors: Ashwinkumar Badanidiyuru and Badih Ghazi and Pritish Kamath and Ravi
Kumar and Ethan Leeman and Pasin Manurangsi and Avinash V Varadarajan and
Chiyuan Zhang
- Abstract summary: We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP)
We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets.
- Score: 61.63619647307816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new family of label randomizers for training regression models
under the constraint of label differential privacy (DP). In particular, we
leverage the trade-offs between bias and variance to construct better label
randomizers depending on a privately estimated prior distribution over the
labels. We demonstrate that these randomizers achieve state-of-the-art
privacy-utility trade-offs on several datasets, highlighting the importance of
reducing bias when training neural networks with label DP. We also provide
theoretical results shedding light on the structural properties of the optimal
unbiased randomizers.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - A Debiased Nearest Neighbors Framework for Multi-Label Text Classification [13.30576550077694]
We introduce a DEbiased Nearest Neighbors (DENN) framework for Multi-Label Text Classification (MLTC)
To address embedding alignment bias, we propose a debiased contrastive learning strategy, enhancing neighbor consistency on label co-occurrence.
For confidence estimation bias, we present a debiased confidence estimation strategy, improving the adaptive combination of predictions from $k$NN and inductive binary classifications.
arXiv Detail & Related papers (2024-08-06T14:00:23Z) - Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias [5.698050337128548]
Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples.
For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions.
We propose a novel confidence measure, called $mathcalT$-similarity, built upon the prediction diversity of an ensemble of linear classifiers.
arXiv Detail & Related papers (2023-10-23T11:30:06Z) - Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift [1.3597551064547502]
We learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution.
We propose to split the labeled data into two subsets, and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model.
Our estimator achieves the minimax optimal error rate up to a polylogarithmic factor, and we find that using pseudo-labels for model selection does not significantly hinder performance.
arXiv Detail & Related papers (2023-02-20T18:46:12Z) - Regression with Label Differential Privacy [64.21020761920322]
We derive a label DP randomization mechanism that is optimal under a given regression loss function.
We prove that the optimal mechanism takes the form of a "randomized response on bins"
arXiv Detail & Related papers (2022-12-12T17:41:32Z) - Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse.
We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z) - Label differential privacy via clustering [27.485176618438842]
We present new mechanisms for differentially private machine learning that only protects the privacy of the labels in the training set.
Our mechanisms cluster the examples in the training set using their (non-private) feature vectors, randomly re-sample each label from examples in the same cluster, and output a training set with noisy labels as well as a modified version of the true loss function.
We prove that when the clusters are both large and high-quality, the model that minimizes the modified loss on the noisy training set converges to small excess risk at a rate that is comparable to the rate for non-private learning.
arXiv Detail & Related papers (2021-10-05T16:47:27Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.