Selective Differential Privacy for Language Modeling
- URL: http://arxiv.org/abs/2108.12944v1
- Date: Mon, 30 Aug 2021 01:11:10 GMT
- Title: Selective Differential Privacy for Language Modeling
- Authors: Weiyan Shi, Aiqi Cui, Evan Li, Ruoxi Jia, Zhou Yu
- Abstract summary: Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees.
We propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data.
Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities.
- Score: 36.64464956102432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing adoption of language models in applications involving
sensitive data, it has become crucial to protect these models from leaking
private information. Previous work has attempted to tackle this challenge by
training RNN-based language models with differential privacy guarantees.
However, applying classical differential privacy to language models leads to
poor model performance as the underlying privacy notion is over-pessimistic and
provides undifferentiated protection for all tokens of the data. Given that the
private information in natural language is sparse (for example, the bulk of an
email might not carry personally identifiable information), we propose a new
privacy notion, selective differential privacy, to provide rigorous privacy
guarantees on the sensitive portion of the data to improve model utility. To
realize such a new notion, we develop a corresponding privacy mechanism,
Selective-DPSGD, for RNN-based language models. Besides language modeling, we
also apply the method to a more concrete application -- dialog systems.
Experiments on both language modeling and dialog system building show that the
proposed privacy-preserving mechanism achieves better utilities while remaining
safe under various privacy attacks compared to the baselines. The data, code
and models are available at https://github.com/wyshi/lm_privacy.
Related papers
- Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer [57.04801796205638]
Large Language Models (LLMs) have emerged as dominant tools for various tasks.
However, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information.
We present Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge.
arXiv Detail & Related papers (2023-11-27T02:01:10Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Can Language Models be Instructed to Protect Personal Information? [30.187731765653428]
We introduce PrivQA -- a benchmark to assess the privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario.
We find that adversaries can easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs.
We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections.
arXiv Detail & Related papers (2023-10-03T17:30:33Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Privacy-Adaptive BERT for Natural Language Understanding [20.821155542969947]
We study how to improve the effectiveness of NLU models under a Local Privacy setting using BERT.
We propose privacy-adaptive LM pretraining methods and demonstrate that they can significantly improve model performance on privatized text input.
arXiv Detail & Related papers (2021-04-15T15:01:28Z) - Differentially Private Language Models Benefit from Public Pre-training [1.2676356746752895]
We study the feasibility of learning a language model which is simultaneously high-quality and privacy preserving.
We find that DP fine-tuning boosts the performance of language models in the private domain.
arXiv Detail & Related papers (2020-09-13T00:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.