Related papers: Regularizing Class-wise Predictions via Self-knowledge Distillation

Regularizing Class-wise Predictions via Self-knowledge Distillation

URL: http://arxiv.org/abs/2003.13964v2
Date: Tue, 7 Apr 2020 05:28:07 GMT
Title: Regularizing Class-wise Predictions via Self-knowledge Distillation
Authors: Sukmin Yun, Jongjin Park, Kimin Lee, Jinwoo Shin
Abstract summary: We propose a new regularization method that penalizes the predictive distribution between similar samples. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve the generalization ability.
Score: 80.76254453115766
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In particular, we distill the predictive distribution between different samples of the same label during training. This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network (i.e., a self-knowledge distillation) by forcing it to produce more meaningful and consistent predictions in a class-wise manner. Consequently, it mitigates overconfident predictions and reduces intra-class variations. Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve not only the generalization ability but also the calibration performance of modern convolutional neural networks.

Related papers

GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks. GIT combines the usage of gradient information and invariance transformations. Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z)
Multi-Head Multi-Loss Model Calibration [13.841172927454204]
We introduce a form of simplified ensembling that bypasses the costly training and inference of deep ensembles. Specifically, each head is trained to minimize a weighted Cross-Entropy loss, but the weights are different among the different branches. We show that the resulting averaged predictions can achieve excellent calibration without sacrificing accuracy in two challenging datasets.
arXiv Detail & Related papers (2023-03-02T09:32:32Z)
On double-descent in uncertainty quantification in overparametrized models [24.073221004661427]
Uncertainty quantification is a central challenge in reliable and trustworthy machine learning. We show a trade-off between classification accuracy and calibration, unveiling a double descent like behavior in the calibration curve of optimally regularized estimators. This is in contrast with the empirical Bayes method, which we show to be well calibrated in our setting despite the higher generalization error and overparametrization.
arXiv Detail & Related papers (2022-10-23T16:01:08Z)
Confidence estimation of classification based on the distribution of the neural network output layer [4.529188601556233]
One of the most common problems preventing the application of prediction models in the real world is lack of generalization. We propose novel methods that estimate uncertainty of particular predictions generated by a neural network classification model. The proposed methods infer the confidence of a particular prediction based on the distribution of the logit values corresponding to this prediction.
arXiv Detail & Related papers (2022-10-14T12:32:50Z)
Bayesian Neural Network Versus Ex-Post Calibration For Prediction Uncertainty [0.2343856409260935]
Probabilistic predictions from neural networks account for predictive uncertainty during classification. In practice most datasets are trained on non-probabilistic neural networks which by default do not capture this inherent uncertainty. A plausible alternative to the calibration approach is to use Bayesian neural networks, which directly models a predictive distribution.
arXiv Detail & Related papers (2022-09-29T07:22:19Z)
Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters. We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
Revisiting Explicit Regularization in Neural Networks for Well-Calibrated Predictive Uncertainty [6.09170287691728]
In this work, we revisit the importance of explicit regularization for obtaining well-calibrated predictive uncertainty. We introduce a measure of calibration performance, which is lower bounded by the log-likelihood. We then explore explicit regularization techniques for improving the log-likelihood on unseen samples, which provides well-calibrated predictive uncertainty.
arXiv Detail & Related papers (2020-06-11T13:14:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.