Related papers: When does loss-based prioritization fail?

When does loss-based prioritization fail?

URL: http://arxiv.org/abs/2107.07741v1
Date: Fri, 16 Jul 2021 07:23:15 GMT
Title: When does loss-based prioritization fail?
Authors: Niel Teng Hu, Xinyu Hu, Rosanne Liu, Sara Hooker, Jason Yosinski
Abstract summary: We show that loss-based acceleration methods degrade in scenarios with noisy and corrupted data. Measures of example difficulty need to correctly separate out noise from other types of challenging examples.
Score: 18.982933391138268
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Not all examples are created equal, but standard deep neural network training protocols treat each training point uniformly. Each example is propagated forward and backward through the network the same amount of times, independent of how much the example contributes to the learning protocol. Recent work has proposed ways to accelerate training by deviating from this uniform treatment. Popular methods entail up-weighting examples that contribute more to the loss with the intuition that examples with low loss have already been learned by the model, so their marginal value to the training procedure should be lower. This view assumes that updating the model with high loss examples will be beneficial to the model. However, this may not hold for noisy, real world data. In this paper, we theorize and then empirically demonstrate that loss-based acceleration methods degrade in scenarios with noisy and corrupted data. Our work suggests measures of example difficulty need to correctly separate out noise from other types of challenging examples.

Related papers

Instance-dependent Early Stopping [57.912273923450726]
We propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level. IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero. IES can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.
arXiv Detail & Related papers (2025-02-11T13:34:09Z)
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting [15.251425165987987]
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities. We propose a sample weighting scheme for the fine-tuning data based on the pre-trained model's losses. We empirically demonstrate the efficacy of our method on both language and vision tasks.
arXiv Detail & Related papers (2025-02-05T00:49:59Z)
Reducing Bias in Pre-trained Models by Tuning while Penalizing Change [8.862970622361747]
Deep models trained on large amounts of data often incorporate implicit biases present during training time. New data is often expensive and hard to come by in areas such as autonomous driving or medical decision-making. We present a method based on change penalization that takes a pre-trained model and adapts the weights to mitigate a previously detected bias.
arXiv Detail & Related papers (2024-04-18T16:12:38Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. However, it is not expected in practice considering the memory constraint or data privacy issue. As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. We propose to exploit the interior building blocks of the model to improve efficiency. Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z)
Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z)
Task-Robust Pre-Training for Worst-Case Downstream Adaptation [62.05108162160981]
Pre-training has achieved remarkable success when transferred to downstream tasks. This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks.
arXiv Detail & Related papers (2023-06-21T07:43:23Z)
Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten. We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly. SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z)
DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z)
Exponentiated Gradient Reweighting for Robust Training Under Label Noise and Beyond [21.594200327544968]
We present a flexible approach to learning from noisy examples. Specifically, we treat each training example as an expert and maintain a distribution over all examples. Unlike other related methods, our approach handles a general class of loss functions and can be applied to a wide range of noise types and applications.
arXiv Detail & Related papers (2021-04-03T22:54:49Z)
Robust and On-the-fly Dataset Denoising for Image Classification [72.10311040730815]
On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training. ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
arXiv Detail & Related papers (2020-03-24T03:59:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.