Related papers: Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

URL: http://arxiv.org/abs/2210.13459v1
Date: Sat, 22 Oct 2022 11:52:38 GMT
Title: Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation
Authors: Dongkyu Lee, Ka Chun Cheung, Nevin L. Zhang
Abstract summary: We propose a regularization scheme that brings dynamic nature into the smoothing parameter. A model in training self-regulates the extent of smoothing on the fly during forward propagation.
Score: 16.878277421402945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label smoothing smoothes target labels with a pre-defined prior label distribution; as a result, a model is learned to maximize the likelihood of predicting the soft label. Nonetheless, the amount of smoothing is the same in all samples and remains fixed in training. In other words, label smoothing does not reflect the change in probability distribution mapped by a model over the course of training. To address this issue, we propose a regularization scheme that brings dynamic nature into the smoothing parameter by taking model probability distribution into account, thereby varying the parameter per instance. A model in training self-regulates the extent of smoothing on the fly during forward propagation. Furthermore, inspired by recent work in bridging label smoothing and knowledge distillation, our work utilizes self-knowledge as a prior label distribution in softening target labels, and presents theoretical support for the regularization effect by knowledge distillation and the dynamic smoothing parameter. Our regularizer is validated comprehensively, and the result illustrates marked improvements in model generalization and calibration, enhancing robustness and trustworthiness of a model.

Related papers

Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z)
Model Calibration in Dense Classification with Adaptive Label Perturbation [44.62722402349157]
Existing dense binary classification models are prone to being over-confident. We propose Adaptive Label Perturbation (ASLP) which learns a unique label perturbation level for each training image. ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data.
arXiv Detail & Related papers (2023-07-25T14:40:11Z)
SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation. We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z)
Rethinking Precision of Pseudo Label: Test-Time Adaptation via Complementary Learning [10.396596055773012]
We propose a novel complementary learning approach to enhance test-time adaptation. In test-time adaptation tasks, information from the source domain is typically unavailable. We highlight that the risk function of complementary labels agrees with their Vanilla loss formula.
arXiv Detail & Related papers (2023-01-15T03:36:33Z)
Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z)
Instance-based Label Smoothing For Better Calibrated Classification Networks [3.388509725285237]
Label smoothing is widely used in deep neural networks for multi-class classification. We take inspiration from both label smoothing and self-distillation. We propose two novel instance-based label smoothing approaches.
arXiv Detail & Related papers (2021-10-11T15:33:23Z)
Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates. We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters. We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z)
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing [83.78668073898001]
We introduce a family of entropy regularizers, which includes label smoothing as a special case. We find that variance in model performance can be explained largely by the resulting entropy of the model. We advise the use of other entropy regularization methods in its place.
arXiv Detail & Related papers (2020-05-02T12:46:28Z)
Regularization via Structural Label Smoothing [22.74769739125912]
Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network. We show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data.
arXiv Detail & Related papers (2020-01-07T05:45:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.