SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
- URL: http://arxiv.org/abs/2010.11524v5
- Date: Mon, 30 Aug 2021 02:29:45 GMT
- Title: SlimIPL: Language-Model-Free Iterative Pseudo-Labeling
- Authors: Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan
Collobert
- Abstract summary: Iterative Pseudo-Labeling (IPL) continuously trains a single model using pseudo-labels iteratively re-generated as the model learns.
We call this approach Language-Model-Free IPL (slimIPL) and give a resultant training setup for low-resource settings with CTC-based models.
- Score: 32.39921686482643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent results in end-to-end automatic speech recognition have demonstrated
the efficacy of pseudo-labeling for semi-supervised models trained both with
Connectionist Temporal Classification (CTC) and Sequence-to-Sequence (seq2seq)
losses. Iterative Pseudo-Labeling (IPL), which continuously trains a single
model using pseudo-labels iteratively re-generated as the model learns, has
been shown to further improve performance in ASR. We improve upon the IPL
algorithm: as the model learns, we propose to iteratively re-generate
transcriptions with hard labels (the most probable tokens), that is, without a
language model. We call this approach Language-Model-Free IPL (slimIPL) and
give a resultant training setup for low-resource settings with CTC-based
models. slimIPL features a dynamic cache for pseudo-labels which reduces
sensitivity to changes in relabeling hyperparameters and results in improves
training stability. slimIPL is also highly-efficient and requires 3.5-4x fewer
computational resources to converge than other state-of-the-art
semi/self-supervised approaches. With only 10 hours of labeled audio, slimIPL
is competitive with self-supervised approaches, and is state-of-the-art with
100 hours of labeled audio without the use of a language model both at test
time and during pseudo-label generation.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Uncertainty-aware Sampling for Long-tailed Semi-supervised Learning [89.98353600316285]
We introduce uncertainty into the modeling process for pseudo-label sampling, taking into account that the model performance on the tailed classes varies over different training stages.
This approach allows the model to perceive the uncertainty of pseudo-labels at different training stages, thereby adaptively adjusting the selection thresholds for different classes.
Compared to other methods such as the baseline method FixMatch, UDTS achieves an increase in accuracy of at least approximately 5.26%, 1.75%, 9.96%, and 1.28% on the natural scene image datasets.
arXiv Detail & Related papers (2024-01-09T08:59:39Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Continuous Soft Pseudo-Labeling in ASR [32.19655911858698]
Continuous pseudo-labeling (PL) algorithms have emerged as a powerful strategy for semi-supervised learning in speech recognition.
We find that soft-labels targets can lead to training divergence, with the model collapsing to a degenerate token distribution per frame.
arXiv Detail & Related papers (2022-11-11T05:16:18Z) - Improving Low-Resource Speech Recognition with Pretrained Speech Models:
Continued Pretraining vs. Semi-Supervised Training [6.523198497365586]
Self-supervised Transformer based models, such as wav2vec 2.0 and HuBERT, have produced significant improvements over existing approaches to automatic speech recognition (ASR)
We show CoPT results in word error rates (WERs) equal to or slightly better than using semi-supervised training (SST)
In addition, we show that using the CoPT model for pseudo-labeling, and using these labels in SST, results in further improvements in WER.
arXiv Detail & Related papers (2022-07-01T21:02:51Z) - Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition.
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Semi-Supervised Speech Recognition via Graph-based Temporal
Classification [59.58318952000571]
Semi-supervised learning has demonstrated promising results in automatic speech recognition by self-training.
The effectiveness of this approach largely relies on the pseudo-label accuracy.
Alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance.
arXiv Detail & Related papers (2020-10-29T14:56:56Z) - Iterative Pseudo-Labeling for Speech Recognition [35.48685001317295]
Pseudo-labeling has recently shown promise in end-to-end automatic speech recognition (ASR)
We study Iterative Pseudo-Labeling (IPL), a semi-supervised algorithm which efficiently performs multiple iterations of pseudo-labeling on unlabeled data.
arXiv Detail & Related papers (2020-05-19T07:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.