Recognition, recall, and retention of few-shot memories in large
language models
- URL: http://arxiv.org/abs/2303.17557v1
- Date: Thu, 30 Mar 2023 17:26:16 GMT
- Title: Recognition, recall, and retention of few-shot memories in large
language models
- Authors: A. Emin Orhan
- Abstract summary: We investigate simple recognition, recall, and retention experiments with large language models.
We find that a single exposure is generally sufficient for a model to achieve near perfect accuracy.
The flip side of this remarkable capacity for fast learning is that precise memories are quickly overwritten.
- Score: 21.067139116005592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The training of modern large language models (LLMs) takes place in a regime
where most training examples are seen only a few times by the model during the
course of training. What does a model remember about such examples seen only a
few times during training and how long does that memory persist in the face of
continuous training with new examples? Here, we investigate these questions
through simple recognition, recall, and retention experiments with LLMs. In
recognition experiments, we ask if the model can distinguish the seen example
from a novel example; in recall experiments, we ask if the model can correctly
recall the seen example when cued by a part of it; and in retention
experiments, we periodically probe the model's memory for the original examples
as the model is trained continuously with new examples. We find that a single
exposure is generally sufficient for a model to achieve near perfect accuracy
even in very challenging recognition experiments. We estimate that the
recognition performance of even small language models easily exceeds human
recognition performance reported in similar experiments with humans (Shepard,
1967). Achieving near perfect recall takes more exposures, but most models can
do it in just 3 exposures. The flip side of this remarkable capacity for fast
learning is that precise memories are quickly overwritten: recall performance
for the original examples drops steeply over the first 10 training updates with
new examples, followed by a more gradual decline. Even after 100K updates,
however, some of the original examples are still recalled near perfectly. A
qualitatively similar retention pattern has been observed in human long-term
memory retention studies before (Bahrick, 1984). Finally, recognition is much
more robust to interference than recall and memory for natural language
sentences is generally superior to memory for stimuli without structure.
Related papers
- Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Unintended Memorization in Large ASR Models, and How to Mitigate It [16.047859326721046]
auditing memorization in large non-auto-regressive automatic speech recognition (ASR) models has been challenging.
We design a simple auditing method to measure memorization in large ASR models without the extra compute overhead.
We show that in large-scale distributed training, clipping the average gradient on each compute core maintains neutral model quality and compute cost.
arXiv Detail & Related papers (2023-10-18T06:45:49Z) - What do larger image classifiers memorise? [64.01325988398838]
We show that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes.
We find that knowledge distillation, an effective and popular model compression technique, tends to inhibit memorisation, while also improving generalisation.
arXiv Detail & Related papers (2023-10-09T01:52:07Z) - Measuring Forgetting of Memorized Training Examples [80.9188503645436]
We show machine learning models exhibit two seemingly contradictory phenomena: training data memorization and various forms of memorization.
In specific examples, models overfit specific training and become susceptible to privacy attacks by the end.
We identify deterministically forgetting examples as a potential explanation, showing that models empirically do not forget trained examples over time.
arXiv Detail & Related papers (2022-06-30T20:48:26Z) - Can deep learning match the efficiency of human visual long-term memory
in storing object details? [21.067139116005592]
Humans have a remarkably large capacity to store detailed visual information in long-term memory.
This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory.
arXiv Detail & Related papers (2022-04-27T17:00:37Z) - Quantifying Memorization Across Neural Language Models [61.58529162310382]
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized data verbatim.
This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others).
We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data.
arXiv Detail & Related papers (2022-02-15T18:48:31Z) - Chasing the Tail in Monocular 3D Human Reconstruction with Prototype
Memory [98.36233875637168]
We propose a prototype memory-augmented network, PM-Net, that effectively improves performances of predicting rare poses.
In this work, we 1) identify and analyze this learning obstacle and 2) propose a prototype memory-augmented network, PM-Net, that effectively improves performances of predicting rare poses.
arXiv Detail & Related papers (2020-12-29T12:57:22Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.