Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs
- URL: http://arxiv.org/abs/2508.02573v1
- Date: Mon, 04 Aug 2025 16:27:56 GMT
- Title: Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs
- Authors: Jérémie Dentan, Davide Buscaldi, Sonia Vanier,
- Abstract summary: Verbatim memorization in Large Language Models is a multifaceted phenomenon involving distinct underlying mechanisms.<n>We introduce a novel method to analyze the different forms of memorization described by the existing taxonomy.<n>We develop a custom visual interpretability technique to localize the regions of the attention weights involved in each form of memorization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Verbatim memorization in Large Language Models (LLMs) is a multifaceted phenomenon involving distinct underlying mechanisms. We introduce a novel method to analyze the different forms of memorization described by the existing taxonomy. Specifically, we train Convolutional Neural Networks (CNNs) on the attention weights of the LLM and evaluate the alignment between this taxonomy and the attention weights involved in decoding. We find that the existing taxonomy performs poorly and fails to reflect distinct mechanisms within the attention blocks. We propose a new taxonomy that maximizes alignment with the attention weights, consisting of three categories: memorized samples that are guessed using language modeling abilities, memorized samples that are recalled due to high duplication in the training set, and non-memorized samples. Our results reveal that few-shot verbatim memorization does not correspond to a distinct attention mechanism. We also show that a significant proportion of extractable samples are in fact guessed by the model and should therefore be studied separately. Finally, we develop a custom visual interpretability technique to localize the regions of the attention weights involved in each form of memorization.
Related papers
- A Geometric Framework for Understanding Memorization in Generative Models [11.263296715798374]
Recent work has shown that deep generative models can be capable of memorizing and reproducing training datapoints when deployed.<n>These findings call into question the usability of generative models, especially in light of the legal and privacy risks brought about by memorization.<n>We propose the manifold memorization hypothesis (MMH), a geometric framework which leverages the manifold hypothesis into a clear language in which to reason about memorization.
arXiv Detail & Related papers (2024-10-31T18:09:01Z) - Exploring Local Memorization in Diffusion Models via Bright Ending Attention [62.979954692036685]
"bright ending" (BE) anomaly in text-to-image diffusion models prone to memorizing training images.<n>We propose a simple yet effective method to integrate BE into existing frameworks.
arXiv Detail & Related papers (2024-10-29T02:16:01Z) - Predicting memorization within Large Language Models fine-tuned for classification [0.0]
Large Language Models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time.<n>We propose a new approach to detect memorized samples a priori in LLMs fine-tuned for classification tasks.<n>Our method is supported by new theoretical results, and requires a low computational budget.
arXiv Detail & Related papers (2024-09-27T15:53:55Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - Demystifying Verbatim Memorization in Large Language Models [67.49068128909349]
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications.
We develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences.
We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to memorize verbatim sequences, even for out-of-distribution sequences.
arXiv Detail & Related papers (2024-07-25T07:10:31Z) - Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon [22.271015657198927]
We break memorization down into a taxonomy: recitation of highly duplicated sequences, reconstruction of inherently predictable sequences, and recollection of sequences that are neither.<n>By analyzing dependencies and inspecting the weights of a predictive model, we find that different factors influence the likelihood of memorization differently depending on the taxonomic category.
arXiv Detail & Related papers (2024-06-25T17:32:16Z) - A Multi-Perspective Analysis of Memorization in Large Language Models [10.276594755936529]
Large Language Models (LLMs) show unprecedented performance in various fields.
LLMs can generate the same content used to train them.
This research comprehensively discussed memorization from various perspectives.
arXiv Detail & Related papers (2024-05-19T15:00:50Z) - Exploring Memorization in Fine-tuned Language Models [53.52403444655213]
We conduct the first comprehensive analysis to explore language models' memorization during fine-tuning across tasks.
Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks.
We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution.
arXiv Detail & Related papers (2023-10-10T15:41:26Z) - Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.
Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z) - Memorization Without Overfitting: Analyzing the Training Dynamics of
Large Language Models [64.22311189896888]
We study exact memorization in causal and masked language modeling, across model sizes and throughout the training process.
Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process.
arXiv Detail & Related papers (2022-05-22T07:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.