The Pitfalls of Memorization: When Memorization Hurts Generalization
- URL: http://arxiv.org/abs/2412.07684v1
- Date: Tue, 10 Dec 2024 17:18:33 GMT
- Title: The Pitfalls of Memorization: When Memorization Hurts Generalization
- Authors: Reza Bayat, Mohammad Pezeshki, Elvis Dohmatob, David Lopez-Paz, Pascal Vincent,
- Abstract summary: Memorization can reduce training loss to zero, leaving no incentive to learn robust, generalizable patterns.<n>We propose memorization-aware training (MAT), which uses held-out predictions as a signal of memorization to shift a model's logits.
- Score: 28.5600484308805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explanations.This behavior leads to poor generalization when the learned explanations rely on spurious correlations. In this work, we formalize the interplay between memorization and generalization, showing that spurious correlations would particularly lead to poor generalization when are combined with memorization. Memorization can reduce training loss to zero, leaving no incentive to learn robust, generalizable patterns. To address this, we propose memorization-aware training (MAT), which uses held-out predictions as a signal of memorization to shift a model's logits. MAT encourages learning robust patterns invariant across distributions, improving generalization under distribution shifts.
Related papers
- Bigger Isn't Always Memorizing: Early Stopping Overparameterized Diffusion Models [51.03144354630136]
Generalization in natural data domains is progressively achieved during training before the onset of memorization.<n>Generalization vs. memorization is then best understood as a competition between time scales.<n>We show that this phenomenology is recovered in diffusion models learning a simple probabilistic context-free grammar with random rules.
arXiv Detail & Related papers (2025-05-22T17:40:08Z) - Memorize or Generalize? Evaluating LLM Code Generation with Evolved Questions [33.58518352911762]
Large Language Models (LLMs) are known to exhibit a memorization phenomenon in code generation.
In this paper, we investigate this phenomenon by designing three evolution strategies to create variants: mutation, paraphrasing, and code-rewriting.
As expected, as supervised fine-tuning goes on, the memorization score rises before overfitting, suggesting more severe memorization.
arXiv Detail & Related papers (2025-03-04T05:39:24Z) - Demystifying Verbatim Memorization in Large Language Models [67.49068128909349]
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications.
We develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences.
We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to memorize verbatim sequences, even for out-of-distribution sequences.
arXiv Detail & Related papers (2024-07-25T07:10:31Z) - What do larger image classifiers memorise? [64.01325988398838]
We show that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes.
We find that knowledge distillation, an effective and popular model compression technique, tends to inhibit memorisation, while also improving generalisation.
arXiv Detail & Related papers (2023-10-09T01:52:07Z) - Studying Generalization on Memory-Based Methods in Continual Learning [9.896917981912106]
Memory-based methods store a percentage of previous data distributions to be used during training.
We show that these methods can help in traditional in-distribution generalization, but can strongly impair out-of-distribution generalization by learning spurious features and correlations.
arXiv Detail & Related papers (2023-06-16T14:59:17Z) - Decoupling Knowledge from Memorization: Retrieval-augmented Prompt
Learning [113.58691755215663]
We develop RetroPrompt to help a model strike a balance between generalization and memorization.
In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances.
Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings.
arXiv Detail & Related papers (2022-05-29T16:07:30Z) - Towards Differential Relational Privacy and its use in Question
Answering [109.4452196071872]
Memorization of relation between entities in a dataset can lead to privacy issues when using a trained question answering model.
We quantify this phenomenon and provide a possible definition of Differential Privacy (DPRP)
We illustrate concepts in experiments with largescale models for Question Answering.
arXiv Detail & Related papers (2022-03-30T22:59:24Z) - Quantifying Memorization Across Neural Language Models [61.58529162310382]
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized data verbatim.
This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others).
We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data.
arXiv Detail & Related papers (2022-02-15T18:48:31Z) - Counterfactual Memorization in Neural Language Models [91.8747020391287]
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data.
An open question in previous studies of language model memorization is how to filter out "common" memorization.
We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
arXiv Detail & Related papers (2021-12-24T04:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.