Related papers: Improving CTC-AED model with integrated-CTC and auxiliary loss regularization

Improving CTC-AED model with integrated-CTC and auxiliary loss regularization

URL: http://arxiv.org/abs/2308.08449v1
Date: Tue, 15 Aug 2023 03:31:47 GMT
Title: Improving CTC-AED model with integrated-CTC and auxiliary loss regularization
Authors: Daobin Zhu, Xiangdong Su and Hongbin Zhang
Abstract summary: Connectionist temporal classification and attention-based encoder decoder (AED) joint training has been widely applied in automatic speech recognition (ASR) In this paper, we employ two fusion methods, namely direct addition of logits (DAL) and preserving the maximum probability (PMP) We achieve dimensional consistency by adaptively affine transforming the attention results to match the dimensions of CTC.
Score: 6.214966465876013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Connectionist temporal classification (CTC) and attention-based encoder decoder (AED) joint training has been widely applied in automatic speech recognition (ASR). Unlike most hybrid models that separately calculate the CTC and AED losses, our proposed integrated-CTC utilizes the attention mechanism of AED to guide the output of CTC. In this paper, we employ two fusion methods, namely direct addition of logits (DAL) and preserving the maximum probability (PMP). We achieve dimensional consistency by adaptively affine transforming the attention results to match the dimensions of CTC. To accelerate model convergence and improve accuracy, we introduce auxiliary loss regularization for accelerated convergence. Experimental results demonstrate that the DAL method performs better in attention rescoring, while the PMP method excels in CTC prefix beam search and greedy search.

Related papers

Channel Fingerprint Construction for Massive MIMO: A Deep Conditional Generative Approach [65.47969413708344]
We introduce the concept of CF twins and design a conditional generative diffusion model (CGDM)<n>We employ a variational inference technique to derive the evidence lower bound (ELBO) for the log-marginal distribution of the observed fine-grained CF conditioned on the coarse-grained CF.<n>We show that the proposed approach exhibits significant improvement in reconstruction performance compared to the baselines.
arXiv Detail & Related papers (2025-05-12T01:36:06Z)
CR-CTC: Consistency regularization on CTC for improved speech recognition [18.996929774821822]
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR) However, it often falls short in recognition performance compared to transducer or systems combining CTC and attention-based encoder-decoder (CTC/AED) We propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram.
arXiv Detail & Related papers (2024-10-07T14:56:07Z)
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z)
Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach [14.69981874614434]
We show how to better optimize a text recognition model from the perspective of loss functions. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with degradation accuracy. We propose a self-distillation scheme for CTC-based model to address this issue.
arXiv Detail & Related papers (2023-08-17T06:32:57Z)
Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks [63.189632935619535]
Bayes risk CTC (BRCTC) is proposed to enforce the desired characteristics of the predicted alignment. By using BRCTC with another preference for early emissions, we obtain an improved performance-latency trade-off for online models.
arXiv Detail & Related papers (2022-10-14T03:55:36Z)
CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework. Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z)
Distilling the Knowledge of BERT for CTC-based ASR [38.345330002791606]
We propose to distill the knowledge of BERT for CTC-based ASR. CTC-based ASR learns the knowledge of BERT during training and does not use BERT during testing. We show that our method improves the performance of CTC-based ASR without the cost of inference speed.
arXiv Detail & Related papers (2022-09-05T16:08:35Z)
Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC) We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z)
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition [46.69852287267763]
This article describes an efficient training method for online streaming attention-based encoder-decoder (AED) automatic speech recognition (ASR) systems. The proposed method significantly reduces recognition errors and emission latency simultaneously. The best MoChA system shows performance comparable to that of RNN-transducer (RNN-T)
arXiv Detail & Related papers (2021-02-28T08:17:38Z)
Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective. We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z)
CTC-synchronous Training for Monotonic Attention Model [43.0382262234792]
backward probabilities cannot be leveraged in the alignment process during training due to left-to-right dependency in the decoder. We propose CTC-synchronous training ( CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments. The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments.
arXiv Detail & Related papers (2020-05-10T16:48:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.