Localizing Paragraph Memorization in Language Models
- URL: http://arxiv.org/abs/2403.19851v1
- Date: Thu, 28 Mar 2024 21:53:24 GMT
- Title: Localizing Paragraph Memorization in Language Models
- Authors: Niklas Stoehr, Mitchell Gordon, Chiyuan Zhang, Owen Lewis,
- Abstract summary: We show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern.
We also show that memorized continuations are not only harder to unlearn, but also to corrupt than non-memorized ones.
- Score: 17.943637462569537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can we localize the weights and mechanisms used by a language model to memorize and recite entire paragraphs of its training data? In this paper, we show that while memorization is spread across multiple layers and model components, gradients of memorized paragraphs have a distinguishable spatial pattern, being larger in lower model layers than gradients of non-memorized examples. Moreover, the memorized examples can be unlearned by fine-tuning only the high-gradient weights. We localize a low-layer attention head that appears to be especially involved in paragraph memorization. This head is predominantly focusing its attention on distinctive, rare tokens that are least frequent in a corpus-level unigram distribution. Next, we study how localized memorization is across the tokens in the prefix by perturbing tokens and measuring the caused change in the decoding. A few distinctive tokens early in a prefix can often corrupt the entire continuation. Overall, memorized continuations are not only harder to unlearn, but also to corrupt than non-memorized ones.
Related papers
- Memorization Sinks: Isolating Memorization during LLM Training [20.682505625638203]
Large language models are susceptible to memorizing repeated sequences, posing privacy and copyright concerns.<n>We propose a new paradigm of MemSinks that promotes isolation of memorization by design.<n>This is the first proof-of-concept on real data demonstrating that simultaneous generalization and isolation is achievable.
arXiv Detail & Related papers (2025-07-14T05:23:27Z) - Demystifying Verbatim Memorization in Large Language Models [67.49068128909349]
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications.
We develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences.
We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to memorize verbatim sequences, even for out-of-distribution sequences.
arXiv Detail & Related papers (2024-07-25T07:10:31Z) - A Multi-Perspective Analysis of Memorization in Large Language Models [10.276594755936529]
Large Language Models (LLMs) show unprecedented performance in various fields.
LLMs can generate the same content used to train them.
This research comprehensively discussed memorization from various perspectives.
arXiv Detail & Related papers (2024-05-19T15:00:50Z) - Can Neural Network Memorization Be Localized? [102.68044087952913]
We show that memorization is a phenomenon confined to a small set of neurons in various layers of the model.
We propose a new form of dropout -- $textitexample-tied dropout$ that enables us to direct memorization of examples to an ai determined set of neurons.
arXiv Detail & Related papers (2023-07-18T18:36:29Z) - Preventing Verbatim Memorization in Language Models Gives a False Sense
of Privacy [91.98116450958331]
We argue that verbatim memorization definitions are too restrictive and fail to capture more subtle forms of memorization.
Specifically, we design and implement an efficient defense that perfectly prevents all verbatim memorization.
We conclude by discussing potential alternative definitions and why defining memorization is a difficult yet crucial open question for neural language models.
arXiv Detail & Related papers (2022-10-31T17:57:55Z) - Memorization Without Overfitting: Analyzing the Training Dynamics of
Large Language Models [64.22311189896888]
We study exact memorization in causal and masked language modeling, across model sizes and throughout the training process.
Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process.
arXiv Detail & Related papers (2022-05-22T07:43:50Z) - Quantifying Memorization Across Neural Language Models [61.58529162310382]
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized data verbatim.
This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others).
We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data.
arXiv Detail & Related papers (2022-02-15T18:48:31Z) - Counterfactual Memorization in Neural Language Models [91.8747020391287]
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data.
An open question in previous studies of language model memorization is how to filter out "common" memorization.
We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
arXiv Detail & Related papers (2021-12-24T04:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.