Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees
- URL: http://arxiv.org/abs/2202.10517v2
- Date: Wed, 23 Feb 2022 21:27:20 GMT
- Title: Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees
- Authors: Christopher M\"uhl, Franziska Boenisch
- Abstract summary: We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
- Score: 1.2691047660244335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applying machine learning (ML) to sensitive domains requires privacy
protection of the underlying training data through formal privacy frameworks,
such as differential privacy (DP). Yet, usually, the privacy of the training
data comes at the costs of the resulting ML models' utility. One reason for
this is that DP uses one homogeneous privacy budget epsilon for all training
data points, which has to align with the strictest privacy requirement
encountered among all data holders. In practice, different data holders might
have different privacy requirements and data points of data holders with lower
requirements could potentially contribute more information to the training
process of the ML models. To account for this possibility, we propose three
novel methods that extend the DP framework Private Aggregation of Teacher
Ensembles (PATE) to support training an ML model with different personalized
privacy guarantees within the training data. We formally describe the methods,
provide theoretical analyses of their privacy bounds, and experimentally
evaluate their effect on the final model's utility at the example of the MNIST
and Adult income datasets. Our experiments show that our personalized privacy
methods yield higher accuracy models than the non-personalized baseline.
Thereby, our methods can improve the privacy-utility trade-off in scenarios in
which different data holders consent to contribute their sensitive data at
different privacy levels.
Related papers
- FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation [4.772368796656325]
In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments.
We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task.
arXiv Detail & Related papers (2024-10-30T02:41:26Z) - Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - Personalized Differential Privacy for Ridge Regression [3.4751583941317166]
We introduce our novel Personalized-DP Output Perturbation method ( PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels.
We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model.
We show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al.
arXiv Detail & Related papers (2024-01-30T16:00:14Z) - Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs.
PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning [0.0]
Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off.
With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server.
In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
arXiv Detail & Related papers (2022-04-02T12:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.