Related papers: Characterizing Datapoints via Second-Split Forgetting

Characterizing Datapoints via Second-Split Forgetting

URL: http://arxiv.org/abs/2210.15031v1
Date: Wed, 26 Oct 2022 21:03:46 GMT
Title: Characterizing Datapoints via Second-Split Forgetting
Authors: Pratyush Maini, Saurabh Garg, Zachary C. Lipton, J. Zico Kolter
Abstract summary: We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten. We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly. SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
Score: 93.99363547536392
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Researchers investigating example hardness have increasingly focused on the dynamics by which neural networks learn and forget examples throughout training. Popular metrics derived from these dynamics include (i) the epoch at which examples are first correctly classified; (ii) the number of times their predictions flip during training; and (iii) whether their prediction flips if they are held out. However, these metrics do not distinguish among examples that are hard for distinct reasons, such as membership in a rare subpopulation, being mislabeled, or belonging to a complex subpopulation. In this paper, we propose $second$-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten as the network is fine-tuned on a randomly held out partition of the data. Across multiple benchmark datasets and modalities, we demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly. By contrast, metrics only considering the first split learning dynamics struggle to differentiate the two. At large learning rates, SSFT tends to be robust across architectures, optimizers, and random seeds. From a practical standpoint, the SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes. Through theoretical analysis addressing overparameterized linear models, we provide insights into how the observed phenomena may arise. Code for reproducing our experiments can be found here: https://github.com/pratyushmaini/ssft

Related papers

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training [8.824077990271503]
We investigate the role of the training dynamics in the transition from generalization to memorization.<n>We find that $tau_mathrmmem$ increases linearly with the training set size $n$, while $tau_mathrmgen$ remains constant.<n>It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times.
arXiv Detail & Related papers (2025-05-23T08:58:47Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. However, it is not expected in practice considering the memory constraint or data privacy issue. As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z)
Toward Understanding Generative Data Augmentation [16.204251285425478]
We show that generative data augmentation can enjoy a faster learning rate when the order of divergence term is $o(maxleft( log(m)beta_m, 1 / sqrtm)right)$. We prove that in both cases, though generative data augmentation does not enjoy a faster learning rate, it can improve the learning guarantees at a constant level when the train set is small.
arXiv Detail & Related papers (2023-05-27T13:46:08Z)
Revisiting Discriminative vs. Generative Classifiers: Theory and Implications [37.98169487351508]
This paper is inspired by the statistical efficiency of naive Bayes. We present a multiclass $mathcalH$-consistency bound framework and an explicit bound for logistic loss. Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases.
arXiv Detail & Related papers (2023-02-05T08:30:42Z)
Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes. It causes severe biases of the head classes (with majority samples) against the tailed ones. We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z)
Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z)
When does loss-based prioritization fail? [18.982933391138268]
We show that loss-based acceleration methods degrade in scenarios with noisy and corrupted data. Measures of example difficulty need to correctly separate out noise from other types of challenging examples.
arXiv Detail & Related papers (2021-07-16T07:23:15Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
Instance Credibility Inference for Few-Shot Learning [45.577880041135785]
Few-shot learning aims to recognize new objects with extremely limited training data for each category. This paper presents a simple statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the distribution support of unlabeled instances for few-shot learning. Our simple approach can establish new state-of-the-arts on four widely used few-shot learning benchmark datasets.
arXiv Detail & Related papers (2020-03-26T12:01:15Z)
Robust and On-the-fly Dataset Denoising for Image Classification [72.10311040730815]
On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training. ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
arXiv Detail & Related papers (2020-03-24T03:59:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.