Intermediate Loss Regularization for CTC-based Speech Recognition
- URL: http://arxiv.org/abs/2102.03216v1
- Date: Fri, 5 Feb 2021 15:01:03 GMT
- Title: Intermediate Loss Regularization for CTC-based Speech Recognition
- Authors: Jaesong Lee, Shinji Watanabe
- Abstract summary: We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
- Score: 58.33721897180646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a simple and efficient auxiliary loss function for automatic
speech recognition (ASR) based on the connectionist temporal classification
(CTC) objective. The proposed objective, an intermediate CTC loss, is attached
to an intermediate layer in the CTC encoder network. This intermediate CTC loss
well regularizes CTC training and improves the performance requiring only small
modification of the code and small and no overhead during training and
inference, respectively. In addition, we propose to combine this intermediate
CTC loss with stochastic depth training, and apply this combination to a
recently proposed Conformer network. We evaluate the proposed method on various
corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character
error rate (CER) 5.2% on the AISHELL-1 corpus respectively, based on CTC greedy
search without a language model. Especially, the AISHELL-1 task is comparable
to other state-of-the-art ASR systems based on auto-regressive decoder with
beam search.
Related papers
- Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Self-distillation Regularized Connectionist Temporal Classification Loss
for Text Recognition: A Simple Yet Effective Approach [14.69981874614434]
We show how to better optimize a text recognition model from the perspective of loss functions.
CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with degradation accuracy.
We propose a self-distillation scheme for CTC-based model to address this issue.
arXiv Detail & Related papers (2023-08-17T06:32:57Z) - Improving CTC-AED model with integrated-CTC and auxiliary loss
regularization [6.214966465876013]
Connectionist temporal classification and attention-based encoder decoder (AED) joint training has been widely applied in automatic speech recognition (ASR)
In this paper, we employ two fusion methods, namely direct addition of logits (DAL) and preserving the maximum probability (PMP)
We achieve dimensional consistency by adaptively affine transforming the attention results to match the dimensions of CTC.
arXiv Detail & Related papers (2023-08-15T03:31:47Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - CTC Variations Through New WFST Topologies [79.94035631317395]
This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition.
Three new CTC variants are proposed: (1) the "compact-CTC", in which direct transitions between units are replaced with epsilon> back-off transitions; (2) the "minimal-CTC", that only adds blank> self-loops when used in WFST-composition; and (3) "selfless-CTC", that disallows self-loop for non-blank units.
arXiv Detail & Related papers (2021-10-06T23:00:15Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Relaxing the Conditional Independence Assumption of CTC-based ASR by
Conditioning on Intermediate Predictions [14.376418789524783]
We train a CTC-based ASR model with auxiliary CTC losses in intermediate layers in addition to the original CTC loss in the last layer.
Our method is easy to implement and retains the merits of CTC-based ASR: a simple model architecture and fast decoding speed.
arXiv Detail & Related papers (2021-04-06T18:00:03Z) - Improved Mask-CTC for Non-Autoregressive End-to-End ASR [49.192579824582694]
Recently proposed end-to-end ASR system based on mask-predict with connectionist temporal classification (CTC)
We propose to enhance the network architecture by employing a recently proposed architecture called Conformer.
Next, we propose new training and decoding methods by introducing auxiliary objective to predict the length of a partial target sequence.
arXiv Detail & Related papers (2020-10-26T01:22:35Z) - Reducing Spelling Inconsistencies in Code-Switching ASR using
Contextualized CTC Loss [5.707652271634435]
We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies.
CCTC loss does not require frame-level alignments, since the context ground truth is obtained from the model's estimated path.
Compared to the same model trained with regular CTC loss, our method consistently improved the ASR performance on both CS and monolingual corpora.
arXiv Detail & Related papers (2020-05-16T09:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.