When does loss-based prioritization fail?
- URL: http://arxiv.org/abs/2107.07741v1
- Date: Fri, 16 Jul 2021 07:23:15 GMT
- Title: When does loss-based prioritization fail?
- Authors: Niel Teng Hu, Xinyu Hu, Rosanne Liu, Sara Hooker, Jason Yosinski
- Abstract summary: We show that loss-based acceleration methods degrade in scenarios with noisy and corrupted data.
Measures of example difficulty need to correctly separate out noise from other types of challenging examples.
- Score: 18.982933391138268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Not all examples are created equal, but standard deep neural network training
protocols treat each training point uniformly. Each example is propagated
forward and backward through the network the same amount of times, independent
of how much the example contributes to the learning protocol. Recent work has
proposed ways to accelerate training by deviating from this uniform treatment.
Popular methods entail up-weighting examples that contribute more to the loss
with the intuition that examples with low loss have already been learned by the
model, so their marginal value to the training procedure should be lower. This
view assumes that updating the model with high loss examples will be beneficial
to the model. However, this may not hold for noisy, real world data. In this
paper, we theorize and then empirically demonstrate that loss-based
acceleration methods degrade in scenarios with noisy and corrupted data. Our
work suggests measures of example difficulty need to correctly separate out
noise from other types of challenging examples.
Related papers
- Instance-dependent Early Stopping [57.912273923450726]
We propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level.
IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero.
IES can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.
arXiv Detail & Related papers (2025-02-11T13:34:09Z) - Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting [15.251425165987987]
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities.
We propose a sample weighting scheme for the fine-tuning data based on the pre-trained model's losses.
We empirically demonstrate the efficacy of our method on both language and vision tasks.
arXiv Detail & Related papers (2025-02-05T00:49:59Z) - Reducing Bias in Pre-trained Models by Tuning while Penalizing Change [8.862970622361747]
Deep models trained on large amounts of data often incorporate implicit biases present during training time.
New data is often expensive and hard to come by in areas such as autonomous driving or medical decision-making.
We present a method based on change penalization that takes a pre-trained model and adapts the weights to mitigate a previously detected bias.
arXiv Detail & Related papers (2024-04-18T16:12:38Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Task-Robust Pre-Training for Worst-Case Downstream Adaptation [62.05108162160981]
Pre-training has achieved remarkable success when transferred to downstream tasks.
This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks.
arXiv Detail & Related papers (2023-06-21T07:43:23Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - Exponentiated Gradient Reweighting for Robust Training Under Label Noise
and Beyond [21.594200327544968]
We present a flexible approach to learning from noisy examples.
Specifically, we treat each training example as an expert and maintain a distribution over all examples.
Unlike other related methods, our approach handles a general class of loss functions and can be applied to a wide range of noise types and applications.
arXiv Detail & Related papers (2021-04-03T22:54:49Z) - Robust and On-the-fly Dataset Denoising for Image Classification [72.10311040730815]
On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training.
ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
arXiv Detail & Related papers (2020-03-24T03:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.