Related papers: Understanding Transformer Memorization Recall Through Idioms

Understanding Transformer Memorization Recall Through Idioms

URL: http://arxiv.org/abs/2210.03588v2
Date: Tue, 11 Oct 2022 17:51:06 GMT
Title: Understanding Transformer Memorization Recall Through Idioms
Authors: Adi Haviv, Ido Cohen, Jacob Gidron, Roei Schuster, Yoav Goldberg and Mor Geva
Abstract summary: We offer the first methodological framework for probing and characterizing recall of memorized sequences in language models. We analyze the internal prediction construction process by interpreting the model's hidden representations as a gradual refinement of the output probability distribution. Our work makes a first step towards understanding memory recall, and provides a methodological basis for future studies of transformer memorization.
Score: 42.28269674547148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To produce accurate predictions, language models (LMs) must balance between generalization and memorization. Yet, little is known about the mechanism by which transformer LMs employ their memorization capacity. When does a model decide to output a memorized phrase, and how is this phrase then retrieved from memory? In this work, we offer the first methodological framework for probing and characterizing recall of memorized sequences in transformer LMs. First, we lay out criteria for detecting model inputs that trigger memory recall, and propose idioms as inputs that fulfill these criteria. Next, we construct a dataset of English idioms and use it to compare model behavior on memorized vs. non-memorized inputs. Specifically, we analyze the internal prediction construction process by interpreting the model's hidden representations as a gradual refinement of the output probability distribution. We find that across different model sizes and architectures, memorized predictions are a two-step process: early layers promote the predicted token to the top of the output distribution, and upper layers increase model confidence. This suggests that memorized information is stored and retrieved in the early layers of the network. Last, we demonstrate the utility of our methodology beyond idioms in memorized factual statements. Overall, our work makes a first step towards understanding memory recall, and provides a methodological basis for future studies of transformer memorization.

Related papers

Understanding Verbatim Memorization in LLMs Through Circuit Discovery [11.007171636579868]
Underlying mechanisms of memorization in LLMs remain poorly understood.<n>We use transformer circuits -- the minimal computational subgraphs that perform specific functions within the model.<n>We find that circuits that initiate memorization can also maintain it once started, while circuits that only maintain memorization cannot trigger its initiation.
arXiv Detail & Related papers (2025-06-17T20:14:56Z)
A Geometric Framework for Understanding Memorization in Generative Models [11.263296715798374]
Recent work has shown that deep generative models can be capable of memorizing and reproducing training datapoints when deployed. These findings call into question the usability of generative models, especially in light of the legal and privacy risks brought about by memorization. We propose the manifold memorization hypothesis (MMH), a geometric framework which leverages the manifold hypothesis into a clear language in which to reason about memorization.
arXiv Detail & Related papers (2024-10-31T18:09:01Z)
Predictive Attractor Models [9.947717243638289]
We propose textitPredictive Attractor Models (PAM), a novel sequence memory architecture with desirable generative properties. PAM avoids catastrophic forgetting by uniquely representing past context through lateral inhibition in cortical minicolumns. We show that PAM is trained with local computations through Hebbian plasticity rules in a biologically plausible framework.
arXiv Detail & Related papers (2024-10-03T12:25:01Z)
Demystifying Verbatim Memorization in Large Language Models [67.49068128909349]
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications. We develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences. We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to memorize verbatim sequences, even for out-of-distribution sequences.
arXiv Detail & Related papers (2024-07-25T07:10:31Z)
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency. This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z)
A Multi-Perspective Analysis of Memorization in Large Language Models [10.276594755936529]
Large Language Models (LLMs) show unprecedented performance in various fields. LLMs can generate the same content used to train them. This research comprehensively discussed memorization from various perspectives.
arXiv Detail & Related papers (2024-05-19T15:00:50Z)
Explaining How Transformers Use Context to Build Predictions [0.1749935196721634]
Language Generation Models produce words based on the previous context. It is still unclear how prior words affect the model's decision throughout the layers. We leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation.
arXiv Detail & Related papers (2023-05-21T18:29:10Z)
Emergent and Predictable Memorization in Large Language Models [23.567027014457775]
Memorization, or the tendency of large language models to output entire sequences from their training data verbatim, is a key concern for safely deploying language models. We seek to predict which sequences will be memorized before a large model's full train-time by extrapolating the memorization behavior of lower-compute trial runs. We provide further novel discoveries on the distribution of memorization scores across models and data.
arXiv Detail & Related papers (2023-04-21T17:58:31Z)
A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from. Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes. One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z)
Quantifying Memorization Across Neural Language Models [61.58529162310382]
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized data verbatim. This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others). We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data.
arXiv Detail & Related papers (2022-02-15T18:48:31Z)
Counterfactual Memorization in Neural Language Models [91.8747020391287]
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. An open question in previous studies of language model memorization is how to filter out "common" memorization. We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
arXiv Detail & Related papers (2021-12-24T04:20:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.