What Does it Mean for a Language Model to Preserve Privacy?
- URL: http://arxiv.org/abs/2202.05520v2
- Date: Mon, 14 Feb 2022 06:55:34 GMT
- Title: What Does it Mean for a Language Model to Preserve Privacy?
- Authors: Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri,
Florian Tram\`er
- Abstract summary: Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life.
We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models.
We conclude that language models should be trained on text data which was explicitly produced for public use.
- Score: 12.955456268790005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language reflects our private lives and identities, making its
privacy concerns as broad as those of real life. Language models lack the
ability to understand the context and sensitivity of text, and tend to memorize
phrases present in their training sets. An adversary can exploit this tendency
to extract training data. Depending on the nature of the content and the
context in which this data was collected, this could violate expectations of
privacy. Thus there is a growing interest in techniques for training language
models that preserve privacy. In this paper, we discuss the mismatch between
the narrow assumptions made by popular data protection techniques (data
sanitization and differential privacy), and the broadness of natural language
and of privacy as a social norm. We argue that existing protection methods
cannot guarantee a generic and meaningful notion of privacy for language
models. We conclude that language models should be trained on text data which
was explicitly produced for public use.
Related papers
- NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans.
We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - Privacy-Preserving Language Model Inference with Instance Obfuscation [33.86459812694288]
Language Models as a Service (LM) offers convenient access for developers and researchers to perform inference using pre-trained language models.
The input data and the inference results containing private information are exposed as plaintext during the service call, leading to privacy issues.
We propose Instance-Obfuscated Inference (IOI) method, which focuses on addressing the decision privacy issue of natural language understanding tasks.
arXiv Detail & Related papers (2024-02-13T05:36:54Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - Planting and Mitigating Memorized Content in Predictive-Text Language
Models [11.911353678499008]
Language models are widely deployed to provide automatic text completion services in user products.
Recent research has revealed that language models bear considerable risk of memorizing private training data.
In this study, we test the efficacy of a range of privacy-preserving techniques to mitigate unintended memorization of sensitive user text.
arXiv Detail & Related papers (2022-12-16T17:57:14Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - Defending against Reconstruction Attacks with R\'enyi Differential
Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model.
Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget.
We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z) - Selective Differential Privacy for Language Modeling [36.64464956102432]
Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees.
We propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data.
Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities.
arXiv Detail & Related papers (2021-08-30T01:11:10Z) - Privacy-Adaptive BERT for Natural Language Understanding [20.821155542969947]
We study how to improve the effectiveness of NLU models under a Local Privacy setting using BERT.
We propose privacy-adaptive LM pretraining methods and demonstrate that they can significantly improve model performance on privatized text input.
arXiv Detail & Related papers (2021-04-15T15:01:28Z) - Differentially Private Language Models Benefit from Public Pre-training [1.2676356746752895]
We study the feasibility of learning a language model which is simultaneously high-quality and privacy preserving.
We find that DP fine-tuning boosts the performance of language models in the private domain.
arXiv Detail & Related papers (2020-09-13T00:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.