Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It
- URL: http://arxiv.org/abs/2403.14715v3
- Date: Thu, 20 Feb 2025 15:02:44 GMT
- Title: Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It
- Authors: Guoxuan Xia, Olivier Laurent, Gianni Franchi, Christos-Savvas Bouganis,
- Abstract summary: Label smoothing (LS) is a popular regularisation method for training neural networks.
LS degrades uncertainty rank ordering of correct vs incorrect predictions.
We provide an explanation for this behaviour by analysing logit-level gradients.
- Score: 6.19039575840278
- License:
- Abstract: Label smoothing (LS) is a popular regularisation method for training neural networks as it is effective in improving test accuracy and is simple to implement. ``Hard'' one-hot labels are ``smoothed'' by uniformly distributing probability mass to other classes, reducing overfitting. Prior work has suggested that in some cases LS can degrade selective classification (SC) -- where the aim is to reject misclassifications using a model's uncertainty. In this work, we first demonstrate empirically across an extended range of large-scale tasks and architectures that LS consistently degrades SC. We then address a gap in existing knowledge, providing an explanation for this behaviour by analysing logit-level gradients: LS degrades the uncertainty rank ordering of correct vs incorrect predictions by suppressing the max logit more when a prediction is likely to be correct, and less when it is likely to be wrong. This elucidates previously reported experimental results where strong classifiers underperform in SC. We then demonstrate the empirical effectiveness of post-hoc logit normalisation for recovering lost SC performance caused by LS. Furthermore, linking back to our gradient analysis, we again provide an explanation for why such normalisation is effective.
Related papers
- Adaptive Label Smoothing for Out-of-Distribution Detection [1.5999407512883508]
We propose a novel regularization method called adaptive label smoothing (ALS)
ALS pushes the non-true classes to have same probabilities whereas the maximal probability is neither fixed nor limited.
Our code will be available to the public.
arXiv Detail & Related papers (2024-10-08T15:35:11Z) - Do not trust what you trust: Miscalibration in Semi-supervised Learning [21.20806568508201]
State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples.
We show that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy.
We integrate a simple penalty term, which enforces the logit of the predictions on unlabeled samples to remain low, preventing the network predictions to become overconfident.
arXiv Detail & Related papers (2024-03-22T18:43:46Z) - Model Calibration in Dense Classification with Adaptive Label
Perturbation [44.62722402349157]
Existing dense binary classification models are prone to being over-confident.
We propose Adaptive Label Perturbation (ASLP) which learns a unique label perturbation level for each training image.
ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data.
arXiv Detail & Related papers (2023-07-25T14:40:11Z) - When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples.
A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction.
Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z) - Taming Overconfident Prediction on Unlabeled Data from Hindsight [50.9088560433925]
Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning.
This paper proposes a dual mechanism, named ADaptive Sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions.
ADS significantly improves the state-of-the-art SSL methods by making it a plug-in.
arXiv Detail & Related papers (2021-12-15T15:17:02Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Re-Assessing the "Classify and Count" Quantification Method [88.60021378715636]
"Classify and Count" (CC) is often a biased estimator.
Previous works have failed to use properly optimised versions of CC.
We argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy.
arXiv Detail & Related papers (2020-11-04T21:47:39Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.