Can Language Models be Instructed to Protect Personal Information?
- URL: http://arxiv.org/abs/2310.02224v1
- Date: Tue, 3 Oct 2023 17:30:33 GMT
- Title: Can Language Models be Instructed to Protect Personal Information?
- Authors: Yang Chen, Ethan Mendes, Sauvik Das, Wei Xu, Alan Ritter
- Abstract summary: We introduce PrivQA -- a benchmark to assess the privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario.
We find that adversaries can easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs.
We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections.
- Score: 30.187731765653428
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large multimodal language models have proven transformative in numerous
applications. However, these models have been shown to memorize and leak
pre-training data, raising serious user privacy and information security
concerns. While data leaks should be prevented, it is also crucial to examine
the trade-off between the privacy protection and model utility of proposed
approaches. In this paper, we introduce PrivQA -- a multimodal benchmark to
assess this privacy/utility trade-off when a model is instructed to protect
specific categories of personal information in a simulated scenario. We also
propose a technique to iteratively self-moderate responses, which significantly
improves privacy. However, through a series of red-teaming experiments, we find
that adversaries can also easily circumvent these protections with simple
jailbreaking methods through textual and/or image inputs. We believe PrivQA has
the potential to support the development of new models with improved privacy
protections, as well as the adversarial robustness of these protections. We
release the entire PrivQA dataset at https://llm-access-control.github.io/.
Related papers
- Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Privacy Backdoors: Stealing Data with Corrupted Pretrained Models [23.54726973460633]
Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications.
We show that this practice introduces a new risk of privacy backdoors.
We show how to build privacy backdoors for a variety of models, including transformers.
arXiv Detail & Related papers (2024-03-30T20:43:53Z) - Membership Inference Attacks and Privacy in Topic Modeling [3.503833571450681]
We propose an attack against topic models that can confidently identify members of the training data.
We propose a framework for private topic modeling that incorporates DP vocabulary selection as a pre-processing step.
arXiv Detail & Related papers (2024-03-07T12:43:42Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - RecUP-FL: Reconciling Utility and Privacy in Federated Learning via
User-configurable Privacy Defense [9.806681555309519]
Federated learning (FL) allows clients to collaboratively train a model without sharing their private data.
Recent studies have shown that private information can still be leaked through shared gradients.
We propose a user-configurable privacy defense, RecUP-FL, that can better focus on the user-specified sensitive attributes.
arXiv Detail & Related papers (2023-04-11T10:59:45Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Defending against Reconstruction Attacks with R\'enyi Differential
Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model.
Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget.
We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z) - Selective Differential Privacy for Language Modeling [36.64464956102432]
Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees.
We propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data.
Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities.
arXiv Detail & Related papers (2021-08-30T01:11:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.