Continuous Soft Pseudo-Labeling in ASR
- URL: http://arxiv.org/abs/2211.06007v1
- Date: Fri, 11 Nov 2022 05:16:18 GMT
- Title: Continuous Soft Pseudo-Labeling in ASR
- Authors: Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio
- Abstract summary: Continuous pseudo-labeling (PL) algorithms have emerged as a powerful strategy for semi-supervised learning in speech recognition.
We find that soft-labels targets can lead to training divergence, with the model collapsing to a degenerate token distribution per frame.
- Score: 32.19655911858698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently
emerged as a powerful strategy for semi-supervised learning in speech
recognition. In contrast with earlier strategies that alternated between
training a model and generating pseudo-labels (PLs) with it, here PLs are
generated in end-to-end manner as training proceeds, improving training speed
and the accuracy of the final model. PL shares a common theme with
teacher-student models such as distillation in that a teacher model generates
targets that need to be mimicked by the student model being trained. However,
interestingly, PL strategies in general use hard-labels, whereas distillation
uses the distribution over labels as the target to mimic. Inspired by
distillation we expect that specifying the whole distribution (aka soft-labels)
over sequences as the target for unlabeled data, instead of a single best pass
pseudo-labeled transcript (hard-labels) should improve PL performance and
convergence. Surprisingly and unexpectedly, we find that soft-labels targets
can lead to training divergence, with the model collapsing to a degenerate
token distribution per frame. We hypothesize that the reason this does not
happen with hard-labels is that training loss on hard-labels imposes
sequence-level consistency that keeps the model from collapsing to the
degenerate solution. In this paper, we show several experiments that support
this hypothesis, and experiment with several regularization approaches that can
ameliorate the degenerate collapse when using soft-labels. These approaches can
bring the accuracy of soft-labels closer to that of hard-labels, and while they
are unable to outperform them yet, they serve as a useful framework for further
improvements.
Related papers
- Reduction-based Pseudo-label Generation for Instance-dependent Partial Label Learning [41.345794038968776]
We propose to leverage reduction-based pseudo-labels to alleviate the influence of incorrect candidate labels.
We show that reduction-based pseudo-labels exhibit greater consistency with the Bayes optimal classifier compared to pseudo-labels directly generated from the predictive model.
arXiv Detail & Related papers (2024-10-28T07:32:20Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - SLaM: Student-Label Mixing for Distillation with Unlabeled Examples [15.825078347452024]
We present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM)
SLaM consistently improves over prior approaches by evaluating it on several standard benchmarks.
We give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise.
arXiv Detail & Related papers (2023-02-08T00:14:44Z) - Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly
Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels.
We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z) - Semi-supervised Contrastive Outlier removal for Pseudo Expectation
Maximization (SCOPE) [2.33877878310217]
We present a new approach to suppress confounding errors through a method we describe as Semi-supervised Contrastive Outlier removal for Pseudo Expectation Maximization (SCOPE)
Our results show that SCOPE greatly improves semi-supervised classification accuracy over a baseline, and furthermore when combined with consistency regularization achieves the highest reported accuracy for the semi-supervised CIFAR-10 classification task using 250 and 4000 labeled samples.
arXiv Detail & Related papers (2022-06-28T19:32:50Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Two-phase Pseudo Label Densification for Self-training based Domain
Adaptation [93.03265290594278]
We propose a novel Two-phase Pseudo Label Densification framework, referred to as TPLD.
In the first phase, we use sliding window voting to propagate the confident predictions, utilizing intrinsic spatial-correlations in the images.
In the second phase, we perform a confidence-based easy-hard classification.
To ease the training process and avoid noisy predictions, we introduce the bootstrapping mechanism to the original self-training loss.
arXiv Detail & Related papers (2020-12-09T02:35:25Z) - PseudoSeg: Designing Pseudo Labels for Semantic Segmentation [78.35515004654553]
We present a re-design of pseudo-labeling to generate structured pseudo labels for training with unlabeled or weakly-labeled data.
We demonstrate the effectiveness of the proposed pseudo-labeling strategy in both low-data and high-data regimes.
arXiv Detail & Related papers (2020-10-19T17:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.