Provably Confidential Language Modelling
- URL: http://arxiv.org/abs/2205.01863v1
- Date: Wed, 4 May 2022 02:33:45 GMT
- Title: Provably Confidential Language Modelling
- Authors: Xuandong Zhao, Lei Li, Yu-Xiang Wang
- Abstract summary: We propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments.
We show that our method is able to provably prevent unintended memorization by randomizing parts of the training process.
Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality.
- Score: 36.37616789197548
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large language models are shown to memorize privacy information such as
social security numbers in training data. Given the sheer scale of the training
corpus, it is challenging to screen and filter these privacy data, either
manually or automatically. In this paper, we propose Confidentially Redacted
Training (CRT), a method to train language generation models while protecting
the confidential segments. We borrow ideas from differential privacy (which
solves a related but distinct problem) and show that our method is able to
provably prevent unintended memorization by randomizing parts of the training
process. Moreover, we show that redaction with an approximately correct
screening policy amplifies the confidentiality guarantee. We implement the
method for both LSTM and GPT language models. Our experimental results show
that the models trained by CRT obtain almost the same perplexity while
preserving strong confidentiality.
Related papers
- Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Can Language Models be Instructed to Protect Personal Information? [30.187731765653428]
We introduce PrivQA -- a benchmark to assess the privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario.
We find that adversaries can easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs.
We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections.
arXiv Detail & Related papers (2023-10-03T17:30:33Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Training Natural Language Processing Models on Encrypted Text for
Enhanced Privacy [0.0]
We propose a method for training NLP models on encrypted text data to mitigate data privacy concerns.
Our results indicate that both encrypted and non-encrypted models achieve comparable performance.
arXiv Detail & Related papers (2023-05-03T00:37:06Z) - Mitigating Approximate Memorization in Language Models via Dissimilarity
Learned Policy [0.0]
Large Language models (LLMs) are trained on large amounts of data.
LLMs showed to memorize parts of the training data and emit those data verbatim when an adversary prompts appropriately.
arXiv Detail & Related papers (2023-05-02T15:53:28Z) - Planting and Mitigating Memorized Content in Predictive-Text Language
Models [11.911353678499008]
Language models are widely deployed to provide automatic text completion services in user products.
Recent research has revealed that language models bear considerable risk of memorizing private training data.
In this study, we test the efficacy of a range of privacy-preserving techniques to mitigate unintended memorization of sensitive user text.
arXiv Detail & Related papers (2022-12-16T17:57:14Z) - Preventing Verbatim Memorization in Language Models Gives a False Sense
of Privacy [91.98116450958331]
We argue that verbatim memorization definitions are too restrictive and fail to capture more subtle forms of memorization.
Specifically, we design and implement an efficient defense that perfectly prevents all verbatim memorization.
We conclude by discussing potential alternative definitions and why defining memorization is a difficult yet crucial open question for neural language models.
arXiv Detail & Related papers (2022-10-31T17:57:55Z) - Mitigating Unintended Memorization in Language Models via Alternating
Teaching [15.112637366882185]
We propose a novel approach to mitigate unintended memorization in sequential modeling.
In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect.
Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results.
arXiv Detail & Related papers (2022-10-13T06:26:41Z) - Defending against Reconstruction Attacks with R\'enyi Differential
Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model.
Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget.
We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.