Deduplicating Training Data Mitigates Privacy Risks in Language Models
- URL: http://arxiv.org/abs/2202.06539v2
- Date: Wed, 16 Feb 2022 18:55:11 GMT
- Title: Deduplicating Training Data Mitigates Privacy Risks in Language Models
- Authors: Nikhil Kandpal, Eric Wallace, Colin Raffel
- Abstract summary: We show that the success of privacy attacks is largely due to duplication in commonly used web-scraped training sets.
We show that the rate at which language models regenerate training sequences is superlinearly related to a sequence's count in the training set.
We find that after applying methods to deduplicate training data, language models are considerably more secure against these types of privacy attacks.
- Score: 35.643052320353114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Past work has shown that large language models are susceptible to privacy
attacks, where adversaries generate sequences from a trained model and detect
which sequences are memorized from the training set. In this work, we show that
the success of these attacks is largely due to duplication in commonly used
web-scraped training sets. We first show that the rate at which language models
regenerate training sequences is superlinearly related to a sequence's count in
the training set. For instance, a sequence that is present 10 times in the
training data is on average generated ~1000 times more often than a sequence
that is present only once. We next show that existing methods for detecting
memorized sequences have near-chance accuracy on non-duplicated training
sequences. Finally, we find that after applying methods to deduplicate training
data, language models are considerably more secure against these types of
privacy attacks. Taken together, our results motivate an increased focus on
deduplication in privacy-sensitive applications and a reevaluation of the
practicality of existing privacy attacks.
Related papers
- Memory Backdoor Attacks on Neural Networks [3.2720947374803777]
We propose the memory backdoor attack, where a model is covertly trained to specific training samples and later selectively output them.
We demonstrate the attack on image classifiers, segmentation models, and a large language model (LLM)
arXiv Detail & Related papers (2024-11-21T16:09:16Z) - Demystifying Verbatim Memorization in Large Language Models [67.49068128909349]
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications.
We develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences.
We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to memorize verbatim sequences, even for out-of-distribution sequences.
arXiv Detail & Related papers (2024-07-25T07:10:31Z) - Forgetting Private Textual Sequences in Language Models via
Leave-One-Out Ensemble [13.893379594151533]
We propose a novel leave-one-out ensemble method to unlearn the targeted textual sequences that need to be forgotten from the model.
Experiments on LibriSpeech and WikiText-103 datasets show that the proposed method achieves superior privacy-utility trade-offs than other counterparts.
arXiv Detail & Related papers (2023-09-28T00:43:18Z) - Mitigating Unintended Memorization in Language Models via Alternating
Teaching [15.112637366882185]
We propose a novel approach to mitigate unintended memorization in sequential modeling.
In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect.
Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results.
arXiv Detail & Related papers (2022-10-13T06:26:41Z) - Measuring Forgetting of Memorized Training Examples [80.9188503645436]
We show machine learning models exhibit two seemingly contradictory phenomena: training data memorization and various forms of memorization.
In specific examples, models overfit specific training and become susceptible to privacy attacks by the end.
We identify deterministically forgetting examples as a potential explanation, showing that models empirically do not forget trained examples over time.
arXiv Detail & Related papers (2022-06-30T20:48:26Z) - Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained
Language Models [67.48894919842576]
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require numeracy.
We propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step.
Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy.
arXiv Detail & Related papers (2022-05-13T16:10:13Z) - Provably Confidential Language Modelling [36.37616789197548]
We propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments.
We show that our method is able to provably prevent unintended memorization by randomizing parts of the training process.
Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality.
arXiv Detail & Related papers (2022-05-04T02:33:45Z) - Extracting Training Data from Large Language Models [78.3839333127544]
This paper demonstrates that an adversary can perform a training data extraction attack to recover individual training examples by querying the language model.
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data.
arXiv Detail & Related papers (2020-12-14T18:39:09Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.