Locally Adaptive Label Smoothing for Predictive Churn
- URL: http://arxiv.org/abs/2102.05140v1
- Date: Tue, 9 Feb 2021 21:38:37 GMT
- Title: Locally Adaptive Label Smoothing for Predictive Churn
- Authors: Dara Bahri and Heinrich Jiang
- Abstract summary: Training modern neural networks can lead to high emphprediction churn.
We present several baselines for reducing churn and show that training on soft labels obtained by adaptively smoothing each example's label often outperforms the baselines on churn.
- Score: 40.17985689233356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training modern neural networks is an inherently noisy process that can lead
to high \emph{prediction churn} -- disagreements between re-trainings of the
same model due to factors such as randomization in the parameter initialization
and mini-batches -- even when the trained models all attain similar accuracies.
Such prediction churn can be very undesirable in practice. In this paper, we
present several baselines for reducing churn and show that training on soft
labels obtained by adaptively smoothing each example's label based on the
example's neighboring labels often outperforms the baselines on churn while
improving accuracy on a variety of benchmark classification tasks and model
architectures.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Asymmetric Co-teaching with Multi-view Consensus for Noisy Label
Learning [15.690502285538411]
We introduce our noisy-label learning approach, called Asymmetric Co-teaching (AsyCo)
AsyCo produces more consistent divergent results of the co-teaching models.
Experiments on synthetic and real-world noisy-label datasets show that AsyCo improves over current SOTA methods.
arXiv Detail & Related papers (2023-01-01T04:10:03Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.
The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data.
Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z) - Robust Neural Network Classification via Double Regularization [2.41710192205034]
We propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the classification model and an optimal reweighting of training observations.
We demonstrate DRFit, for neural net classification of (i) MNIST and (ii) CIFAR-10, in both cases with simulated mislabeling.
arXiv Detail & Related papers (2021-12-15T13:19:20Z) - Instance-based Label Smoothing For Better Calibrated Classification
Networks [3.388509725285237]
Label smoothing is widely used in deep neural networks for multi-class classification.
We take inspiration from both label smoothing and self-distillation.
We propose two novel instance-based label smoothing approaches.
arXiv Detail & Related papers (2021-10-11T15:33:23Z) - Generalization by Recognizing Confusion [3.018691733760647]
Self-adaptive training technique augments modern neural networks by allowing them to adjust training labels on the fly.
By combining the self-adaptive objective with mixup, we further improve the accuracy of self-adaptive models for image recognition.
We find evidence that the Rademacher complexity of these algorithms is low, suggesting a new path towards provable generalization.
arXiv Detail & Related papers (2020-06-13T22:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.