Adaptive Label Smoothing with Self-Knowledge in Natural Language
Generation
- URL: http://arxiv.org/abs/2210.13459v1
- Date: Sat, 22 Oct 2022 11:52:38 GMT
- Title: Adaptive Label Smoothing with Self-Knowledge in Natural Language
Generation
- Authors: Dongkyu Lee, Ka Chun Cheung, Nevin L. Zhang
- Abstract summary: We propose a regularization scheme that brings dynamic nature into the smoothing parameter.
A model in training self-regulates the extent of smoothing on the fly during forward propagation.
- Score: 16.878277421402945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overconfidence has been shown to impair generalization and calibration of a
neural network. Previous studies remedy this issue by adding a regularization
term to a loss function, preventing a model from making a peaked distribution.
Label smoothing smoothes target labels with a pre-defined prior label
distribution; as a result, a model is learned to maximize the likelihood of
predicting the soft label. Nonetheless, the amount of smoothing is the same in
all samples and remains fixed in training. In other words, label smoothing does
not reflect the change in probability distribution mapped by a model over the
course of training. To address this issue, we propose a regularization scheme
that brings dynamic nature into the smoothing parameter by taking model
probability distribution into account, thereby varying the parameter per
instance. A model in training self-regulates the extent of smoothing on the fly
during forward propagation. Furthermore, inspired by recent work in bridging
label smoothing and knowledge distillation, our work utilizes self-knowledge as
a prior label distribution in softening target labels, and presents theoretical
support for the regularization effect by knowledge distillation and the dynamic
smoothing parameter. Our regularizer is validated comprehensively, and the
result illustrates marked improvements in model generalization and calibration,
enhancing robustness and trustworthiness of a model.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Model Calibration in Dense Classification with Adaptive Label
Perturbation [44.62722402349157]
Existing dense binary classification models are prone to being over-confident.
We propose Adaptive Label Perturbation (ASLP) which learns a unique label perturbation level for each training image.
ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data.
arXiv Detail & Related papers (2023-07-25T14:40:11Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Rethinking Precision of Pseudo Label: Test-Time Adaptation via
Complementary Learning [10.396596055773012]
We propose a novel complementary learning approach to enhance test-time adaptation.
In test-time adaptation tasks, information from the source domain is typically unavailable.
We highlight that the risk function of complementary labels agrees with their Vanilla loss formula.
arXiv Detail & Related papers (2023-01-15T03:36:33Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Instance-based Label Smoothing For Better Calibrated Classification
Networks [3.388509725285237]
Label smoothing is widely used in deep neural networks for multi-class classification.
We take inspiration from both label smoothing and self-distillation.
We propose two novel instance-based label smoothing approaches.
arXiv Detail & Related papers (2021-10-11T15:33:23Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Generalized Entropy Regularization or: There's Nothing Special about
Label Smoothing [83.78668073898001]
We introduce a family of entropy regularizers, which includes label smoothing as a special case.
We find that variance in model performance can be explained largely by the resulting entropy of the model.
We advise the use of other entropy regularization methods in its place.
arXiv Detail & Related papers (2020-05-02T12:46:28Z) - Regularization via Structural Label Smoothing [22.74769739125912]
Regularization is an effective way to promote the generalization performance of machine learning models.
In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network.
We show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data.
arXiv Detail & Related papers (2020-01-07T05:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.