Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks
- URL: http://arxiv.org/abs/2408.04965v1
- Date: Fri, 9 Aug 2024 09:30:57 GMT
- Title: Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks
- Authors: Verna Dankers, Ivan Titov,
- Abstract summary: Memorisation is a natural part of learning from real-world data.
We show that memorisation is a gradual process rather than a localised one.
- Score: 33.1099258648462
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Memorisation is a natural part of learning from real-world data: neural models pick up on atypical input-output combinations and store those training examples in their parameter space. That this happens is well-known, but how and where are questions that remain largely unanswered. Given a multi-layered neural model, where does memorisation occur in the millions of parameters? Related work reports conflicting findings: a dominant hypothesis based on image classification is that lower layers learn generalisable features and that deeper layers specialise and memorise. Work from NLP suggests this does not apply to language models, but has been mainly focused on memorisation of facts. We expand the scope of the localisation question to 12 natural language classification tasks and apply 4 memorisation localisation techniques. Our results indicate that memorisation is a gradual process rather than a localised one, establish that memorisation is task-dependent, and give nuance to the generalisation first, memorisation second hypothesis.
Related papers
- Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon [22.271015657198927]
We break memorization down into a taxonomy: recitation of highly duplicated sequences, reconstruction of inherently predictable sequences, and recollection of sequences that are neither.
By analyzing dependencies and inspecting the weights of a predictive model, we find that different factors influence the likelihood of memorization differently depending on the taxonomic category.
arXiv Detail & Related papers (2024-06-25T17:32:16Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Memorisation Cartography: Mapping out the Memorisation-Generalisation
Continuum in Neural Machine Translation [41.816534359921896]
We use the counterfactual memorisation metric to build a resource that places 5M NMT datapoints on a memorisation-generalisation map.
We also illustrate how the datapoints' surface-level characteristics and a models' per-datum training signals are predictive of memorisation in NMT.
arXiv Detail & Related papers (2023-11-09T14:03:51Z) - SoK: Memorisation in machine learning [5.563171090433323]
Quantifying the impact of individual data samples on machine learning models is an open research problem.
In this work we unify a broad range of previous definitions and perspectives on memorisation in ML.
We discuss their interplay with model generalisation and their implications of these phenomena on data privacy.
arXiv Detail & Related papers (2023-11-06T12:59:18Z) - What do larger image classifiers memorise? [64.01325988398838]
We show that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes.
We find that knowledge distillation, an effective and popular model compression technique, tends to inhibit memorisation, while also improving generalisation.
arXiv Detail & Related papers (2023-10-09T01:52:07Z) - Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism
of Language Models [49.39276272693035]
Large-scale pre-trained language models have shown remarkable memorizing ability.
Vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem.
We find that 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation.
arXiv Detail & Related papers (2023-05-16T03:50:38Z) - Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.
Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z) - How Relevant is Selective Memory Population in Lifelong Language
Learning? [15.9310767099639]
State-of-the-art approaches rely on sparse experience replay as the primary approach to prevent forgetting.
We investigate how relevant the selective memory population is in the lifelong learning process of text classification and question-answering tasks.
arXiv Detail & Related papers (2022-10-03T13:52:54Z) - Pin the Memory: Learning to Generalize Semantic Segmentation [68.367763672095]
We present a novel memory-guided domain generalization method for semantic segmentation based on meta-learning framework.
Our method abstracts the conceptual knowledge of semantic classes into categorical memory which is constant beyond the domains.
arXiv Detail & Related papers (2022-04-07T17:34:01Z) - Counterfactual Memorization in Neural Language Models [91.8747020391287]
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data.
An open question in previous studies of language model memorization is how to filter out "common" memorization.
We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
arXiv Detail & Related papers (2021-12-24T04:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.