Privacy Regularization: Joint Privacy-Utility Optimization in Language
Models
- URL: http://arxiv.org/abs/2103.07567v1
- Date: Fri, 12 Mar 2021 23:17:43 GMT
- Title: Privacy Regularization: Joint Privacy-Utility Optimization in Language
Models
- Authors: Fatemehsadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor
R\"uhle, Taylor Berg-Kirkpatrick, Robert Sim
- Abstract summary: We introduce two privacy-preserving regularization methods for training language models.
We show the advantages of our regularizers with favorable utility-privacy trade-off.
- Score: 27.389684148671858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural language models are known to have a high capacity for memorization of
training samples. This may have serious privacy implications when training
models on user content such as email correspondence. Differential privacy (DP),
a popular choice to train models with privacy guarantees, comes with
significant costs in terms of utility degradation and disparate impact on
subgroups of users. In this work, we introduce two privacy-preserving
regularization methods for training language models that enable joint
optimization of utility and privacy through (1) the use of a discriminator and
(2) the inclusion of a triplet-loss term. We compare our methods with DP
through extensive evaluation. We show the advantages of our regularizers with
favorable utility-privacy trade-off, faster training with the ability to tap
into existing optimization approaches, and ensuring uniform treatment of
under-represented subgroups.
Related papers
- Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Personalized Adaptation via In-Context Preference Learning [20.042909385219716]
Preference Pretrained Transformer (PPT) is a novel approach for adaptive personalization using online user feedback.
Our results suggest the potential of in-context learning for scalable and efficient personalization in large language models.
arXiv Detail & Related papers (2024-10-17T20:06:02Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs.
PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Advancing Personalized Federated Learning: Group Privacy, Fairness, and
Beyond [6.731000738818571]
Federated learning (FL) is a framework for training machine learning models in a distributed and collaborative manner.
In this paper, we address the triadic interaction among personalization, privacy guarantees, and fairness attained by models trained within the FL framework.
A method is put forth that introduces group privacy assurances through the utilization of $d$-privacy.
arXiv Detail & Related papers (2023-09-01T12:20:19Z) - Can Public Large Language Models Help Private Cross-device Federated Learning? [58.05449579773249]
We study (differentially) private federated learning (FL) of language models.
Public data has been used to improve privacy-utility trade-offs for both large and small language models.
We propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution.
arXiv Detail & Related papers (2023-05-20T07:55:58Z) - On the utility and protection of optimization with differential privacy
and classic regularization techniques [9.413131350284083]
We study the effectiveness of the differentially-private descent (DP-SGD) algorithm against standard optimization practices with regularization techniques.
We discuss differential privacy's flaws and limits and empirically demonstrate the often superior privacy-preserving properties of dropout and l2-regularization.
arXiv Detail & Related papers (2022-09-07T14:10:21Z) - Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.