Filter and evolve: progressive pseudo label refining for semi-supervised
automatic speech recognition
- URL: http://arxiv.org/abs/2210.16318v1
- Date: Fri, 28 Oct 2022 16:15:58 GMT
- Title: Filter and evolve: progressive pseudo label refining for semi-supervised
automatic speech recognition
- Authors: Zezhong Jin, Dading Zhong, Xiao Song, Zhaoyi Liu, Naipeng Ye,
Qingcheng Zeng
- Abstract summary: Low quality pseudo labels can misguide decision boundaries and degrade performance.
We propose a simple yet effective strategy to filter low quality pseudo labels.
Experiments on LibriSpeech show that these filtered samples enable the refined model to yield more correct predictions.
- Score: 5.735000563764309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine tuning self supervised pretrained models using pseudo labels can
effectively improve speech recognition performance. But, low quality pseudo
labels can misguide decision boundaries and degrade performance. We propose a
simple yet effective strategy to filter low quality pseudo labels to alleviate
this problem. Specifically, pseudo-labels are produced over the entire training
set and filtered via average probability scores calculated from the model
output. Subsequently, an optimal percentage of utterances with high probability
scores are considered reliable training data with trustworthy labels. The model
is iteratively updated to correct the unreliable pseudo labels to minimize the
effect of noisy labels. The process above is repeated until unreliable pseudo
abels have been adequately corrected. Extensive experiments on LibriSpeech show
that these filtered samples enable the refined model to yield more correct
predictions, leading to better ASR performances under various experimental
settings.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
Recognition [49.42732949233184]
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition.
Taking noisy labels as ground-truth in the loss function results in suboptimal performance.
We propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels.
arXiv Detail & Related papers (2023-08-12T12:13:52Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Learning from Noisy Labels with Decoupled Meta Label Purifier [33.87292143223425]
Training deep neural networks with noisy labels is challenging since DNN can easily memorize inaccurate labels.
In this paper, we propose a novel multi-stage label purifier named DMLP.
DMLP decouples the label correction process into label-free representation learning and a simple meta label purifier.
arXiv Detail & Related papers (2023-02-14T03:39:30Z) - Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly
Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels.
We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z) - LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly
Supervised Text Classification [28.37907856670151]
Pseudo-labels are noisy due to their nature, so selecting the correct ones has a huge potential for performance boost.
We propose a novel pseudo-label selection method LOPS that memorize takes learning order of samples into consideration.
LOPS can be viewed as a strong performance-boost plug-in to most of existing weakly-supervised text classification methods.
arXiv Detail & Related papers (2022-05-25T06:46:48Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Error-Bounded Correction of Noisy Labels [17.510654621245656]
We show that the prediction of a noisy classifier can indeed be a good indicator of whether the label of a training data is clean.
Based on the theoretical result, we propose a novel algorithm that corrects the labels based on the noisy classifier prediction.
We incorporate our label correction algorithm into the training of deep neural networks and train models that achieve superior testing performance on multiple public datasets.
arXiv Detail & Related papers (2020-11-19T19:23:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.