When does loss-based prioritization fail?
- URL: http://arxiv.org/abs/2107.07741v1
- Date: Fri, 16 Jul 2021 07:23:15 GMT
- Title: When does loss-based prioritization fail?
- Authors: Niel Teng Hu, Xinyu Hu, Rosanne Liu, Sara Hooker, Jason Yosinski
- Abstract summary: We show that loss-based acceleration methods degrade in scenarios with noisy and corrupted data.
Measures of example difficulty need to correctly separate out noise from other types of challenging examples.
- Score: 18.982933391138268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Not all examples are created equal, but standard deep neural network training
protocols treat each training point uniformly. Each example is propagated
forward and backward through the network the same amount of times, independent
of how much the example contributes to the learning protocol. Recent work has
proposed ways to accelerate training by deviating from this uniform treatment.
Popular methods entail up-weighting examples that contribute more to the loss
with the intuition that examples with low loss have already been learned by the
model, so their marginal value to the training procedure should be lower. This
view assumes that updating the model with high loss examples will be beneficial
to the model. However, this may not hold for noisy, real world data. In this
paper, we theorize and then empirically demonstrate that loss-based
acceleration methods degrade in scenarios with noisy and corrupted data. Our
work suggests measures of example difficulty need to correctly separate out
noise from other types of challenging examples.
Related papers
- Reducing Bias in Pre-trained Models by Tuning while Penalizing Change [8.862970622361747]
Deep models trained on large amounts of data often incorporate implicit biases present during training time.
New data is often expensive and hard to come by in areas such as autonomous driving or medical decision-making.
We present a method based on change penalization that takes a pre-trained model and adapts the weights to mitigate a previously detected bias.
arXiv Detail & Related papers (2024-04-18T16:12:38Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Fast Propagation is Better: Accelerating Single-Step Adversarial
Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples.
We propose to exploit the interior building blocks of the model to improve efficiency.
Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Task-Robust Pre-Training for Worst-Case Downstream Adaptation [62.05108162160981]
Pre-training has achieved remarkable success when transferred to downstream tasks.
This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks.
arXiv Detail & Related papers (2023-06-21T07:43:23Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples
Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance.
To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z) - Exponentiated Gradient Reweighting for Robust Training Under Label Noise
and Beyond [21.594200327544968]
We present a flexible approach to learning from noisy examples.
Specifically, we treat each training example as an expert and maintain a distribution over all examples.
Unlike other related methods, our approach handles a general class of loss functions and can be applied to a wide range of noise types and applications.
arXiv Detail & Related papers (2021-04-03T22:54:49Z) - Robust and On-the-fly Dataset Denoising for Image Classification [72.10311040730815]
On-the-fly Data Denoising (ODD) is robust to mislabeled examples, while introducing almost zero computational overhead compared to standard training.
ODD is able to achieve state-of-the-art results on a wide range of datasets including real-world ones such as WebVision and Clothing1M.
arXiv Detail & Related papers (2020-03-24T03:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.