Mitigating Approximate Memorization in Language Models via Dissimilarity
Learned Policy
- URL: http://arxiv.org/abs/2305.01550v1
- Date: Tue, 2 May 2023 15:53:28 GMT
- Title: Mitigating Approximate Memorization in Language Models via Dissimilarity
Learned Policy
- Authors: Aly M. Kassem
- Abstract summary: Large Language models (LLMs) are trained on large amounts of data.
LLMs showed to memorize parts of the training data and emit those data verbatim when an adversary prompts appropriately.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language models (LLMs) are trained on large amounts of data, which can
include sensitive information that may compromise personal privacy. LLMs showed
to memorize parts of the training data and emit those data verbatim when an
adversary prompts appropriately. Previous research has primarily focused on
data preprocessing and differential privacy techniques to address memorization
or prevent verbatim memorization exclusively, which can give a false sense of
privacy. However, these methods rely on explicit and implicit assumptions about
the structure of the data to be protected, which often results in an incomplete
solution to the problem. To address this, we propose a novel framework that
utilizes a reinforcement learning approach (PPO) to fine-tune LLMs to mitigate
approximate memorization. Our approach utilizes a negative similarity score,
such as BERTScore or SacreBLEU, as a reward signal to learn a dissimilarity
policy. Our results demonstrate that this framework effectively mitigates
approximate memorization while maintaining high levels of coherence and fluency
in the generated samples. Furthermore, our framework is robust in mitigating
approximate memorization across various circumstances, including longer
context, which is known to increase memorization in LLMs.
Related papers
- Rethinking LLM Memorization through the Lens of Adversarial Compression [93.13830893086681]
Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage.
One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information.
We propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs.
arXiv Detail & Related papers (2024-04-23T15:49:37Z) - Unmemorization in Large Language Models via Self-Distillation and
Deliberate Imagination [58.36408867180233]
Large Language Models (LLMs) struggle with crucial issues of privacy violation and unwanted exposure of sensitive data.
We introduce a novel approach termed deliberate imagination in the context of LLM unlearning.
Our results demonstrate the usefulness of this approach across different models and sizes, and also with parameter-efficient fine-tuning.
arXiv Detail & Related papers (2024-02-15T16:21:14Z) - SoK: Memorization in General-Purpose Large Language Models [25.448127387943053]
Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development.
LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways.
We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals.
arXiv Detail & Related papers (2023-10-24T14:25:53Z) - Exploring Memorization in Fine-tuned Language Models [53.52403444655213]
We conduct the first comprehensive analysis to explore language models' memorization during fine-tuning across tasks.
Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks.
We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution.
arXiv Detail & Related papers (2023-10-10T15:41:26Z) - Large Language Models Can Be Good Privacy Protection Learners [53.07930843882592]
We introduce Privacy Protection Language Models (PPLM), a novel paradigm for fine-tuning language models.
Our work offers a theoretical analysis for model design and delves into various techniques such as corpus curation, penalty-based unlikelihood in training loss, and instruction-based tuning.
In particular, instruction tuning with both positive and negative examples, stands out as a promising method, effectively protecting private data while enhancing the model's knowledge.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Knowledge Sanitization of Large Language Models [4.722882736419499]
Large language models (LLMs) trained on a large corpus of Web data can potentially reveal sensitive or confidential information.
Our technique efficiently fine-tunes these models using the Low-Rank Adaptation (LoRA) method.
Experimental results in a closed-book question-answering task show that our straightforward method not only minimizes particular knowledge leakage but also preserves the overall performance of LLMs.
arXiv Detail & Related papers (2023-09-21T07:49:55Z) - Quantifying and Analyzing Entity-level Memorization in Large Language
Models [4.59914731734176]
Large language models (LLMs) have been proven capable of memorizing their training data.
Privacy risks arising from memorization have attracted increasing attention.
We propose a fine-grained, entity-level definition to quantify memorization with conditions and metrics closer to real-world scenarios.
arXiv Detail & Related papers (2023-08-30T03:06:47Z) - Preventing Verbatim Memorization in Language Models Gives a False Sense
of Privacy [91.98116450958331]
We argue that verbatim memorization definitions are too restrictive and fail to capture more subtle forms of memorization.
Specifically, we design and implement an efficient defense that perfectly prevents all verbatim memorization.
We conclude by discussing potential alternative definitions and why defining memorization is a difficult yet crucial open question for neural language models.
arXiv Detail & Related papers (2022-10-31T17:57:55Z) - Knowledge Unlearning for Mitigating Privacy Risks in Language Models [31.322818016245087]
We propose knowledge unlearning as an alternative method to reduce privacy risks for language models.
We show that simply applying the unlikelihood training objective to target token sequences is effective at forgetting them.
We show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori.
arXiv Detail & Related papers (2022-10-04T10:18:11Z) - Counterfactual Memorization in Neural Language Models [91.8747020391287]
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data.
An open question in previous studies of language model memorization is how to filter out "common" memorization.
We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
arXiv Detail & Related papers (2021-12-24T04:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.