Distilling the Knowledge of BERT for CTC-based ASR
- URL: http://arxiv.org/abs/2209.02030v1
- Date: Mon, 5 Sep 2022 16:08:35 GMT
- Title: Distilling the Knowledge of BERT for CTC-based ASR
- Authors: Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai,
Tatsuya Kawahara
- Abstract summary: We propose to distill the knowledge of BERT for CTC-based ASR.
CTC-based ASR learns the knowledge of BERT during training and does not use BERT during testing.
We show that our method improves the performance of CTC-based ASR without the cost of inference speed.
- Score: 38.345330002791606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Connectionist temporal classification (CTC) -based models are attractive
because of their fast inference in automatic speech recognition (ASR). Language
model (LM) integration approaches such as shallow fusion and rescoring can
improve the recognition accuracy of CTC-based ASR by taking advantage of the
knowledge in text corpora. However, they significantly slow down the inference
of CTC. In this study, we propose to distill the knowledge of BERT for
CTC-based ASR, extending our previous study for attention-based ASR. CTC-based
ASR learns the knowledge of BERT during training and does not use BERT during
testing, which maintains the fast inference of CTC. Different from
attention-based models, CTC-based models make frame-level predictions, so they
need to be aligned with token-level predictions of BERT for distillation. We
propose to obtain alignments by calculating the most plausible CTC paths.
Experimental evaluations on the Corpus of Spontaneous Japanese (CSJ) and
TED-LIUM2 show that our method improves the performance of CTC-based ASR
without the cost of inference speed.
Related papers
- Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Less Peaky and More Accurate CTC Forced Alignment by Label Priors [57.48450905027108]
Connectionist temporal classification (CTC) models are known to have peaky output distributions.
This paper aims at alleviating the peaky behavior for CTC and improve its suitability for forced alignment generation.
Our CTC model produces less peaky posteriors and is able to more accurately predict the offset of the tokens besides their onset.
arXiv Detail & Related papers (2024-04-22T17:40:08Z) - Self-distillation Regularized Connectionist Temporal Classification Loss
for Text Recognition: A Simple Yet Effective Approach [14.69981874614434]
We show how to better optimize a text recognition model from the perspective of loss functions.
CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with degradation accuracy.
We propose a self-distillation scheme for CTC-based model to address this issue.
arXiv Detail & Related papers (2023-08-17T06:32:57Z) - BERT Meets CTC: New Formulation of End-to-End Speech Recognition with
Pre-trained Masked Language Model [40.16332045057132]
BERT-CTC is a novel formulation of end-to-end speech recognition.
It incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding.
BERT-CTC improves over conventional approaches across variations in speaking styles and languages.
arXiv Detail & Related papers (2022-10-29T18:19:44Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - Relaxing the Conditional Independence Assumption of CTC-based ASR by
Conditioning on Intermediate Predictions [14.376418789524783]
We train a CTC-based ASR model with auxiliary CTC losses in intermediate layers in addition to the original CTC loss in the last layer.
Our method is easy to implement and retains the merits of CTC-based ASR: a simple model architecture and fast decoding speed.
arXiv Detail & Related papers (2021-04-06T18:00:03Z) - Alignment Knowledge Distillation for Online Streaming Attention-based
Speech Recognition [46.69852287267763]
This article describes an efficient training method for online streaming attention-based encoder-decoder (AED) automatic speech recognition (ASR) systems.
The proposed method significantly reduces recognition errors and emission latency simultaneously.
The best MoChA system shows performance comparable to that of RNN-transducer (RNN-T)
arXiv Detail & Related papers (2021-02-28T08:17:38Z) - Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.