What do larger image classifiers memorise?
- URL: http://arxiv.org/abs/2310.05337v1
- Date: Mon, 9 Oct 2023 01:52:07 GMT
- Title: What do larger image classifiers memorise?
- Authors: Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna
Menon, Sanjiv Kumar
- Abstract summary: We show that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes.
We find that knowledge distillation, an effective and popular model compression technique, tends to inhibit memorisation, while also improving generalisation.
- Score: 64.01325988398838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of modern neural networks has prompted study of the connection
between memorisation and generalisation: overparameterised models generalise
well, despite being able to perfectly fit (memorise) completely random labels.
To carefully study this issue, Feldman proposed a metric to quantify the degree
of memorisation of individual training examples, and empirically computed the
corresponding memorisation profile of a ResNet on image classification
bench-marks. While an exciting first glimpse into what real-world models
memorise, this leaves open a fundamental question: do larger neural models
memorise more? We present a comprehensive empirical analysis of this question
on image classification benchmarks. We find that training examples exhibit an
unexpectedly diverse set of memorisation trajectories across model sizes: most
samples experience decreased memorisation under larger models, while the rest
exhibit cap-shaped or increasing memorisation. We show that various proxies for
the Feldman memorization score fail to capture these fundamental trends.
Lastly, we find that knowledge distillation, an effective and popular model
compression technique, tends to inhibit memorisation, while also improving
generalisation. Specifically, memorisation is mostly inhibited on examples with
increasing memorisation trajectories, thus pointing at how distillation
improves generalisation.
Related papers
- Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks [33.1099258648462]
Memorisation is a natural part of learning from real-world data.
We show that memorisation is a gradual process rather than a localised one.
arXiv Detail & Related papers (2024-08-09T09:30:57Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - From seeing to remembering: Images with harder-to-reconstruct
representations leave stronger memory traces [4.012995481864761]
We present a sparse coding model for compressing feature embeddings of images, and show that the reconstruction residuals from this model predict how well images are encoded into memory.
In an open memorability dataset of scene images, we show that reconstruction error not only explains memory accuracy but also response latencies during retrieval, subsuming, in the latter case, all of the variance explained by powerful vision-only models.
arXiv Detail & Related papers (2023-02-21T01:40:32Z) - Classification and Generation of real-world data with an Associative
Memory Model [0.0]
We extend the capabilities of the basic Associative Memory Model by using a Multiple-Modality framework.
By storing both the images and labels as modalities, a single Memory can be used to retrieve and complete patterns.
arXiv Detail & Related papers (2022-07-11T12:51:27Z) - Measuring Forgetting of Memorized Training Examples [80.9188503645436]
We show machine learning models exhibit two seemingly contradictory phenomena: training data memorization and various forms of memorization.
In specific examples, models overfit specific training and become susceptible to privacy attacks by the end.
We identify deterministically forgetting examples as a potential explanation, showing that models empirically do not forget trained examples over time.
arXiv Detail & Related papers (2022-06-30T20:48:26Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Memorization Without Overfitting: Analyzing the Training Dynamics of
Large Language Models [64.22311189896888]
We study exact memorization in causal and masked language modeling, across model sizes and throughout the training process.
Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process.
arXiv Detail & Related papers (2022-05-22T07:43:50Z) - Learning and Memorizing Representative Prototypes for 3D Point Cloud
Semantic and Instance Segmentation [117.29799759864127]
3D point cloud semantic and instance segmentation is crucial and fundamental for 3D scene understanding.
Deep networks can easily forget the non-dominant cases during the learning process, resulting in unsatisfactory performance.
We propose a memory-augmented network to learn and memorize the representative prototypes that cover diverse samples universally.
arXiv Detail & Related papers (2020-01-06T01:07:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.