InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
- URL: http://arxiv.org/abs/2211.00795v1
- Date: Wed, 2 Nov 2022 00:18:25 GMT
- Title: InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss
- Authors: Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
- Abstract summary: Momentum PL (MPL) trains a connectionist temporal classification ( CTC)-based model on unlabeled data.
CTC is well suited for MPL, or PL-based semi-supervised ASR in general, owing to its simple/fast inference algorithm and robustness against generating collapsed labels.
We propose to enhance MPL by introducing intermediate loss, inspired by the recent advances in CTC-based modeling.
- Score: 43.39035144463951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents InterMPL, a semi-supervised learning method of end-to-end
automatic speech recognition (ASR) that performs pseudo-labeling (PL) with
intermediate supervision. Momentum PL (MPL) trains a connectionist temporal
classification (CTC)-based model on unlabeled data by continuously generating
pseudo-labels on the fly and improving their quality. In contrast to
autoregressive formulations, such as the attention-based encoder-decoder and
transducer, CTC is well suited for MPL, or PL-based semi-supervised ASR in
general, owing to its simple/fast inference algorithm and robustness against
generating collapsed labels. However, CTC generally yields inferior performance
than the autoregressive models due to the conditional independence assumption,
thereby limiting the performance of MPL. We propose to enhance MPL by
introducing intermediate loss, inspired by the recent advances in CTC-based
modeling. Specifically, we focus on self-conditional and hierarchical
conditional CTC, that apply auxiliary CTC losses to intermediate layers such
that the conditional independence assumption is explicitly relaxed. We also
explore how pseudo-labels should be generated and used as supervision for
intermediate losses. Experimental results in different semi-supervised settings
demonstrate that the proposed approach outperforms MPL and improves an ASR
model by up to a 12.1% absolute performance gain. In addition, our detailed
analysis validates the importance of the intermediate loss.
Related papers
- CRMSP: A Semi-supervised Approach for Key Information Extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling [10.886757419138343]
We propose a novel semi-supervised approach for KIE with Class-Rebalancing and Merged Semantic Pseudo-Labeling ( CRMSP)
CRP module introduces a reweighting factor to rebalance pseudo-labels, increasing attention to tail classes.
MSP module clusters tail features of unlabeled data by assigning samples to Merged Prototypes (MP)
arXiv Detail & Related papers (2024-07-19T07:41:26Z) - A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification [61.473485511491795]
Semi-supervised learning (SSL) is a practical challenge in computer vision.
Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL.
We propose a lightweight channel-based ensemble method to consolidate multiple inferior PLs into the theoretically guaranteed unbiased and low-variance one.
arXiv Detail & Related papers (2024-03-27T09:49:37Z) - Self-distillation Regularized Connectionist Temporal Classification Loss
for Text Recognition: A Simple Yet Effective Approach [14.69981874614434]
We show how to better optimize a text recognition model from the perspective of loss functions.
CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with degradation accuracy.
We propose a self-distillation scheme for CTC-based model to address this issue.
arXiv Detail & Related papers (2023-08-17T06:32:57Z) - Learning in Imperfect Environment: Multi-Label Classification with
Long-Tailed Distribution and Partial Labels [53.68653940062605]
We introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC)
We find that most LT-MLC and PL-MLC approaches fail to solve the degradation-MLC.
We propose an end-to-end learning framework: textbfCOrrection $rightarrow$ textbfModificattextbfIon $rightarrow$ balantextbfCe.
arXiv Detail & Related papers (2023-04-20T20:05:08Z) - Improving CTC-based ASR Models with Gated Interlayer Collaboration [9.930655347717932]
We present a Gated Interlayer Collaboration mechanism which introduces contextual information into the models.
We train the model with intermediate CTC losses calculated by the interlayer outputs of the model, in which the probability distributions of the intermediate layers naturally serve as soft label sequences.
arXiv Detail & Related papers (2022-05-25T03:21:27Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition.
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z) - Relaxing the Conditional Independence Assumption of CTC-based ASR by
Conditioning on Intermediate Predictions [14.376418789524783]
We train a CTC-based ASR model with auxiliary CTC losses in intermediate layers in addition to the original CTC loss in the last layer.
Our method is easy to implement and retains the merits of CTC-based ASR: a simple model architecture and fast decoding speed.
arXiv Detail & Related papers (2021-04-06T18:00:03Z) - Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.