Related papers: Provably Confidential Language Modelling

Provably Confidential Language Modelling

URL: http://arxiv.org/abs/2205.01863v1
Date: Wed, 4 May 2022 02:33:45 GMT
Title: Provably Confidential Language Modelling
Authors: Xuandong Zhao, Lei Li, Yu-Xiang Wang
Abstract summary: We propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality.
Score: 36.37616789197548
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter these privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality.

Related papers

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Can Language Models be Instructed to Protect Personal Information? [30.187731765653428]
We introduce PrivQA -- a benchmark to assess the privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario. We find that adversaries can easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs. We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections.
arXiv Detail & Related papers (2023-10-03T17:30:33Z)
Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z)
Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy [0.0]
We propose a method for training NLP models on encrypted text data to mitigate data privacy concerns. Our results indicate that both encrypted and non-encrypted models achieve comparable performance.
arXiv Detail & Related papers (2023-05-03T00:37:06Z)
Mitigating Approximate Memorization in Language Models via Dissimilarity Learned Policy [0.0]
Large Language models (LLMs) are trained on large amounts of data. LLMs showed to memorize parts of the training data and emit those data verbatim when an adversary prompts appropriately.
arXiv Detail & Related papers (2023-05-02T15:53:28Z)
Planting and Mitigating Memorized Content in Predictive-Text Language Models [11.911353678499008]
Language models are widely deployed to provide automatic text completion services in user products. Recent research has revealed that language models bear considerable risk of memorizing private training data. In this study, we test the efficacy of a range of privacy-preserving techniques to mitigate unintended memorization of sensitive user text.
arXiv Detail & Related papers (2022-12-16T17:57:14Z)
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy [91.98116450958331]
We argue that verbatim memorization definitions are too restrictive and fail to capture more subtle forms of memorization. Specifically, we design and implement an efficient defense that perfectly prevents all verbatim memorization. We conclude by discussing potential alternative definitions and why defining memorization is a difficult yet crucial open question for neural language models.
arXiv Detail & Related papers (2022-10-31T17:57:55Z)
Mitigating Unintended Memorization in Language Models via Alternating Teaching [15.112637366882185]
We propose a novel approach to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results.
arXiv Detail & Related papers (2022-10-13T06:26:41Z)
Defending against Reconstruction Attacks with R\'enyi Differential Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget. We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.