Label differential privacy via clustering
- URL: http://arxiv.org/abs/2110.02159v1
- Date: Tue, 5 Oct 2021 16:47:27 GMT
- Title: Label differential privacy via clustering
- Authors: Hossein Esfandiari, Vahab Mirrokni, Umar Syed, Sergei Vassilvitskii
- Abstract summary: We present new mechanisms for differentially private machine learning that only protects the privacy of the labels in the training set.
Our mechanisms cluster the examples in the training set using their (non-private) feature vectors, randomly re-sample each label from examples in the same cluster, and output a training set with noisy labels as well as a modified version of the true loss function.
We prove that when the clusters are both large and high-quality, the model that minimizes the modified loss on the noisy training set converges to small excess risk at a rate that is comparable to the rate for non-private learning.
- Score: 27.485176618438842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present new mechanisms for \emph{label differential privacy}, a relaxation
of differentially private machine learning that only protects the privacy of
the labels in the training set. Our mechanisms cluster the examples in the
training set using their (non-private) feature vectors, randomly re-sample each
label from examples in the same cluster, and output a training set with noisy
labels as well as a modified version of the true loss function. We prove that
when the clusters are both large and high-quality, the model that minimizes the
modified loss on the noisy training set converges to small excess risk at a
rate that is comparable to the rate for non-private learning. We describe both
a centralized mechanism in which the entire training set is stored by a trusted
curator, and a distributed mechanism where each user stores a single labeled
example and replaces her label with the label of a randomly selected user from
the same cluster. We also describe a learning problem in which large clusters
are necessary to achieve both strong privacy and either good precision or good
recall. Our experiments show that randomizing the labels within each cluster
significantly improves the privacy vs. accuracy trade-off compared to applying
uniform randomized response to the labels, and also compared to learning a
model via DP-SGD.
Related papers
- Federated Learning with Only Positive Labels by Exploring Label Correlations [78.59613150221597]
Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints.
In this paper, we study the multi-label classification problem under the federated learning setting.
We propose a novel and generic method termed Federated Averaging by exploring Label Correlations (FedALC)
arXiv Detail & Related papers (2024-04-24T02:22:50Z) - Pairwise Similarity Distribution Clustering for Noisy Label Learning [0.0]
Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels.
We propose a simple yet effective sample selection algorithm to divide the training samples into one clean set and another noisy set.
Experimental results on various benchmark datasets, such as CIFAR-10, CIFAR-100 and Clothing1M, demonstrate significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T11:30:22Z) - Exploring Vacant Classes in Label-Skewed Federated Learning [113.65301899666645]
Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning.
This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates vacant-class distillation and logit suppression simultaneously.
arXiv Detail & Related papers (2024-01-04T16:06:31Z) - Optimal Unbiased Randomizers for Regression with Label Differential
Privacy [61.63619647307816]
We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP)
We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets.
arXiv Detail & Related papers (2023-12-09T19:58:34Z) - Label Inference Attack against Split Learning under Regression Setting [24.287752556622312]
We study the leakage in the scenario of the regression model, where the private labels are continuous numbers.
We propose a novel learning-based attack that integrates gradient information and extra learning regularization objectives.
arXiv Detail & Related papers (2023-01-18T03:17:24Z) - Regularizing Neural Network Training via Identity-wise Discriminative
Feature Suppression [20.89979858757123]
When the number of training samples is small, or the class labels are noisy, networks tend to memorize patterns specific to individual instances to minimize the training error.
This paper explores a remedy by suppressing the network's tendency to rely on instance-specific patterns for empirical error minimisation.
arXiv Detail & Related papers (2022-09-29T05:14:56Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.