Related papers: Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

URL: http://arxiv.org/abs/2204.07667v1
Date: Fri, 15 Apr 2022 22:36:55 GMT
Title: Just Fine-tune Twice: Selective Differential Privacy for Large Language Models
Authors: Weiyan Shi, Si Chen, Chiyuan Zhang, Ruoxi Jia, Zhou Yu
Abstract summary: We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
Score: 69.66654761324702
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing adoption of NLP models in real-world products, it becomes more and more important to protect these models from privacy leakage. Because private information in language data is sparse, previous research formalized a Selective-Differential-Privacy (SDP) notion to provide protection for sensitive tokens detected by policy functions, and prove its effectiveness on RNN-based models. But the previous mechanism requires separating the private and public model parameters and thus cannot be applied on large attention-based models. In this paper, we propose a simple yet effective just-fine-tune-twice privacy mechanism to first fine-tune on in-domain redacted data and then on in-domain private data, to achieve SDP for large Transformer-based language models. We also design explicit and contextual policy functions to provide protections at different levels. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack. We further show that even under low-resource settings with a small amount of in-domain data, SDP can still improve the model utility. We will release the code, data and models to facilitate future research.

Related papers

Private prediction for large-scale synthetic text generation [28.488459921169905]
We present an approach for generating differentially private synthetic text using large language models (LLMs) In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees.
arXiv Detail & Related papers (2024-07-16T18:28:40Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Membership Inference Attacks and Privacy in Topic Modeling [3.503833571450681]
We propose an attack against topic models that can confidently identify members of the training data. We propose a framework for private topic modeling that incorporates DP vocabulary selection as a pre-processing step.
arXiv Detail & Related papers (2024-03-07T12:43:42Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
You Are What You Write: Preserving Privacy in the Era of Large Language Models [2.3431670397288005]
We present an empirical investigation into the extent of the personal information encoded into pre-trained representations by a range of popular models. We show a positive correlation between the complexity of a model, the amount of data used in pre-training, and data leakage.
arXiv Detail & Related papers (2022-04-20T11:12:53Z)
Defending against Reconstruction Attacks with R\'enyi Differential Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget. We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z)
Selective Differential Privacy for Language Modeling [36.64464956102432]
Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees. We propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data. Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities.
arXiv Detail & Related papers (2021-08-30T01:11:10Z)
Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters. We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data. Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.