Self-Knowledge Distillation for Learning Ambiguity
- URL: http://arxiv.org/abs/2406.09719v1
- Date: Fri, 14 Jun 2024 05:11:32 GMT
- Title: Self-Knowledge Distillation for Learning Ambiguity
- Authors: Hancheol Park, Soyeong Jeong, Sukmin Cho, Jong C. Park,
- Abstract summary: Recent language models often over-confidently predict a single label without consideration for its correctness.
We propose a novel self-knowledge distillation method that enables models to learn label distributions more accurately.
We validate our method on diverse NLU benchmark datasets and the experimental results demonstrate its effectiveness in producing better label distributions.
- Score: 11.755814660833549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently predicting a single label without consideration for its correctness. To address this issue, we propose a novel self-knowledge distillation method that enables models to learn label distributions more accurately by leveraging knowledge distilled from their lower layers. This approach also includes a learning phase that re-calibrates the unnecessarily strengthened confidence for training samples judged as extremely ambiguous based on the distilled distribution knowledge. We validate our method on diverse NLU benchmark datasets and the experimental results demonstrate its effectiveness in producing better label distributions. Particularly, through the process of re-calibrating the confidence for highly ambiguous samples, the issue of over-confidence when predictions for unseen samples do not match with their ground-truth labels has been significantly alleviated. This has been shown to contribute to generating better distributions than the existing state-of-the-art method. Moreover, our method is more efficient in training the models compared to the existing method, as it does not involve additional training processes to refine label distributions.
Related papers
- InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised
Learning [34.062061310242385]
We present a new perspective of pseudo-labeling for imbalanced semi-supervised learning (SSL)
We measure whether an unlabeled sample is likely to be in-distribution'' or out-of-distribution''
Experiments demonstrate that our energy-based pseudo-labeling method, textbfInPL, significantly outperforms confidence-based methods on imbalanced SSL benchmarks.
arXiv Detail & Related papers (2023-03-13T16:45:41Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Multi-View Knowledge Distillation from Crowd Annotations for
Out-of-Domain Generalization [53.24606510691877]
We propose new methods for acquiring soft-labels from crowd-annotations by aggregating the distributions produced by existing methods.
We demonstrate that these aggregation methods lead to the most consistent performance across four NLP tasks on out-of-domain test sets.
arXiv Detail & Related papers (2022-12-19T12:40:18Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Uncertainty-aware Label Distribution Learning for Facial Expression
Recognition [13.321770808076398]
We propose a new uncertainty-aware label distribution learning method to improve the robustness of deep models against uncertainty and ambiguity.
Our method can be easily integrated into a deep network to obtain more training supervision and improve recognition accuracy.
arXiv Detail & Related papers (2022-09-21T15:48:41Z) - Gray Learning from Non-IID Data with Out-of-distribution Samples [45.788789553551176]
The integrity of training data, even when annotated by experts, is far from guaranteed.
We introduce a novel approach, termed textitGray Learning, which leverages both ground-truth and complementary labels.
By grounding our approach in statistical learning theory, we derive bounds for the generalization error, demonstrating that GL achieves tight constraints even in non-IID settings.
arXiv Detail & Related papers (2022-06-19T10:46:38Z) - EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning [34.062061310242385]
Recent state-of-the-art methods in semi-supervised learning (SSL) combine consistency regularization with confidence-based pseudo-labeling.
We present a new perspective of pseudo-labeling: instead of relying on model confidence, we instead measure whether an unlabeled sample is likely to be "in-distribution"
arXiv Detail & Related papers (2022-06-13T17:55:07Z) - Credal Self-Supervised Learning [0.0]
We show how to let the learner generate "pseudo-supervision" for unlabeled instances.
In combination with consistency regularization, pseudo-labeling has shown promising performance in various domains.
We compare our methodology to state-of-the-art self-supervision approaches.
arXiv Detail & Related papers (2021-06-22T15:19:04Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.