Memorization Through the Lens of Curvature of Loss Function Around
Samples
- URL: http://arxiv.org/abs/2307.05831v2
- Date: Mon, 2 Oct 2023 03:50:18 GMT
- Title: Memorization Through the Lens of Curvature of Loss Function Around
Samples
- Authors: Isha Garg, Deepak Ravikumar and Kaushik Roy
- Abstract summary: We propose using the curvature of loss function around each training sample, averaged over training epochs, as a measure of memorization of the sample.
We first show that the high curvature samples visually correspond to long-tailed, mislabeled, or conflicting samples, those that are most likely to be memorized.
This analysis helps us find, to the best of our knowledge, a novel failure mode on the CIFAR100 and ImageNet datasets.
- Score: 10.028765645749338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are over-parameterized and easily overfit the datasets
they train on. In the extreme case, it has been shown that these networks can
memorize a training set with fully randomized labels. We propose using the
curvature of loss function around each training sample, averaged over training
epochs, as a measure of memorization of the sample. We use this metric to study
the generalization versus memorization properties of different samples in
popular image datasets and show that it captures memorization statistics well,
both qualitatively and quantitatively. We first show that the high curvature
samples visually correspond to long-tailed, mislabeled, or conflicting samples,
those that are most likely to be memorized. This analysis helps us find, to the
best of our knowledge, a novel failure mode on the CIFAR100 and ImageNet
datasets: that of duplicated images with differing labels. Quantitatively, we
corroborate the validity of our scores via two methods. First, we validate our
scores against an independent and comprehensively calculated baseline, by
showing high cosine similarity with the memorization scores released by Feldman
and Zhang (2020). Second, we inject corrupted samples which are memorized by
the network, and show that these are learned with high curvature. To this end,
we synthetically mislabel a random subset of the dataset. We overfit a network
to it and show that sorting by curvature yields high AUROC values for
identifying the corrupted samples. An added advantage of our method is that it
is scalable, as it requires training only a single network as opposed to the
thousands trained by the baseline, while capturing the aforementioned failure
mode that the baseline fails to identify.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels [13.314778587751588]
Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching.
It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training.
We propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels.
arXiv Detail & Related papers (2024-06-22T04:49:39Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples
Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance.
To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z) - Compare learning: bi-attention network for few-shot learning [6.559037166322981]
One of the Few-shot learning methods called metric learning addresses this challenge by first learning a deep distance metric to determine whether a pair of images belong to the same category.
In this paper, we propose a novel approach named Bi-attention network to compare the instances, which can measure the similarity between embeddings of instances precisely, globally and efficiently.
arXiv Detail & Related papers (2022-03-25T07:39:10Z) - An analysis of over-sampling labeled data in semi-supervised learning
with FixMatch [66.34968300128631]
Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches.
This paper studies whether this common practice improves learning and how.
We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not.
arXiv Detail & Related papers (2022-01-03T12:22:26Z) - Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data [17.7825114228313]
Corrupted labels and class imbalance are commonly encountered in practically collected training data.
Existing approaches alleviate these issues by adopting a sample re-weighting strategy.
However, biased samples with corrupted labels and of tailed classes commonly co-exist in training data.
arXiv Detail & Related papers (2021-12-30T09:20:07Z) - Salvage Reusable Samples from Noisy Data for Robust Learning [70.48919625304]
We propose a reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images.
Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks.
arXiv Detail & Related papers (2020-08-06T02:07:21Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.