Mitigating Unintended Memorization in Language Models via Alternating
Teaching
- URL: http://arxiv.org/abs/2210.06772v1
- Date: Thu, 13 Oct 2022 06:26:41 GMT
- Title: Mitigating Unintended Memorization in Language Models via Alternating
Teaching
- Authors: Zhe Liu, Xuedong Zhang, Fuchun Peng
- Abstract summary: We propose a novel approach to mitigate unintended memorization in sequential modeling.
In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect.
Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results.
- Score: 15.112637366882185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research has shown that language models have a tendency to memorize
rare or unique sequences in the training corpora which can thus leak sensitive
attributes of user data. We employ a teacher-student framework and propose a
novel approach called alternating teaching to mitigate unintended memorization
in sequential modeling. In our method, multiple teachers are trained on
disjoint training sets whose privacy one wishes to protect, and teachers'
predictions supervise the training of a student model in an alternating manner
at each time step. Experiments on LibriSpeech datasets show that the proposed
method achieves superior privacy-preserving results than other counterparts. In
comparison with no prevention for unintended memorization, the overall utility
loss is small when training records are sufficient.
Related papers
- Exploring Memorization in Fine-tuned Language Models [53.52403444655213]
We conduct the first comprehensive analysis to explore language models' memorization during fine-tuning across tasks.
Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks.
We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution.
arXiv Detail & Related papers (2023-10-10T15:41:26Z) - Forgetting Private Textual Sequences in Language Models via
Leave-One-Out Ensemble [13.893379594151533]
We propose a novel leave-one-out ensemble method to unlearn the targeted textual sequences that need to be forgotten from the model.
Experiments on LibriSpeech and WikiText-103 datasets show that the proposed method achieves superior privacy-utility trade-offs than other counterparts.
arXiv Detail & Related papers (2023-09-28T00:43:18Z) - Preventing Verbatim Memorization in Language Models Gives a False Sense
of Privacy [91.98116450958331]
We argue that verbatim memorization definitions are too restrictive and fail to capture more subtle forms of memorization.
Specifically, we design and implement an efficient defense that perfectly prevents all verbatim memorization.
We conclude by discussing potential alternative definitions and why defining memorization is a difficult yet crucial open question for neural language models.
arXiv Detail & Related papers (2022-10-31T17:57:55Z) - An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling
to Differential Privacy Preserving Speech Recognition [51.20130423303659]
We propose an ensemble learning framework with Poisson sub-sampling to train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.
Through boosting under DP, a student model derived from the training data suffers little model degradation from the models trained with no privacy protection.
Our proposed solution leverages upon two mechanisms, namely: (i) a privacy budget amplification via Poisson sub-sampling to train a target prediction model that requires less noise to achieve a same level of privacy budget, and (ii) a combination of the sub-sampling technique and an ensemble teacher-student learning framework.
arXiv Detail & Related papers (2022-10-12T16:34:08Z) - Deduplicating Training Data Mitigates Privacy Risks in Language Models [35.643052320353114]
We show that the success of privacy attacks is largely due to duplication in commonly used web-scraped training sets.
We show that the rate at which language models regenerate training sequences is superlinearly related to a sequence's count in the training set.
We find that after applying methods to deduplicate training data, language models are considerably more secure against these types of privacy attacks.
arXiv Detail & Related papers (2022-02-14T08:20:15Z) - Counterfactual Memorization in Neural Language Models [91.8747020391287]
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data.
An open question in previous studies of language model memorization is how to filter out "common" memorization.
We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
arXiv Detail & Related papers (2021-12-24T04:20:57Z) - Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model.
We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z) - FaceLeaks: Inference Attacks against Transfer Learning Models via
Black-box Queries [2.7564955518050693]
We investigate if one can leak or infer private information without interacting with the teacher model directly.
We propose novel strategies to infer from aggregate-level information.
Our study indicates that information leakage is a real privacy threat to the transfer learning framework widely used in real-life situations.
arXiv Detail & Related papers (2020-10-27T03:02:40Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.