Self-distillation Regularized Connectionist Temporal Classification Loss
for Text Recognition: A Simple Yet Effective Approach
- URL: http://arxiv.org/abs/2308.08806v4
- Date: Fri, 29 Dec 2023 11:06:45 GMT
- Title: Self-distillation Regularized Connectionist Temporal Classification Loss
for Text Recognition: A Simple Yet Effective Approach
- Authors: Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min
Wang and Wei Peng
- Abstract summary: We show how to better optimize a text recognition model from the perspective of loss functions.
CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with degradation accuracy.
We propose a self-distillation scheme for CTC-based model to address this issue.
- Score: 14.69981874614434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text recognition methods are gaining rapid development. Some advanced
techniques, e.g., powerful modules, language models, and un- and
semi-supervised learning schemes, consecutively push the performance on public
benchmarks forward. However, the problem of how to better optimize a text
recognition model from the perspective of loss functions is largely overlooked.
CTC-based methods, widely used in practice due to their good balance between
performance and inference speed, still grapple with accuracy degradation. This
is because CTC loss emphasizes the optimization of the entire sequence target
while neglecting to learn individual characters. We propose a self-distillation
scheme for CTC-based model to address this issue. It incorporates a framewise
regularization term in CTC loss to emphasize individual supervision, and
leverages the maximizing-a-posteriori of latent alignment to solve the
inconsistency problem that arises in distillation between CTC-based models. We
refer to the regularized CTC loss as Distillation Connectionist Temporal
Classification (DCTC) loss. DCTC loss is module-free, requiring no extra
parameters, longer inference lag, or additional training data or phases.
Extensive experiments on public benchmarks demonstrate that DCTC can boost text
recognition model accuracy by up to 2.6%, without any of these drawbacks.
Related papers
- CR-CTC: Consistency regularization on CTC for improved speech recognition [18.996929774821822]
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR)
However, it often falls short in recognition performance compared to transducer or systems combining CTC and attention-based encoder-decoder (CTC/AED)
We propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram.
arXiv Detail & Related papers (2024-10-07T14:56:07Z) - Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Alignment Knowledge Distillation for Online Streaming Attention-based
Speech Recognition [46.69852287267763]
This article describes an efficient training method for online streaming attention-based encoder-decoder (AED) automatic speech recognition (ASR) systems.
The proposed method significantly reduces recognition errors and emission latency simultaneously.
The best MoChA system shows performance comparable to that of RNN-transducer (RNN-T)
arXiv Detail & Related papers (2021-02-28T08:17:38Z) - Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z) - Boosting Continuous Sign Language Recognition via Cross Modality
Augmentation [135.30357113518127]
Continuous sign language recognition deals with unaligned video-text pair.
We propose a novel architecture with cross modality augmentation.
The proposed framework can be easily extended to other existing CTC based continuous SLR architectures.
arXiv Detail & Related papers (2020-10-11T15:07:50Z) - Reducing Spelling Inconsistencies in Code-Switching ASR using
Contextualized CTC Loss [5.707652271634435]
We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies.
CCTC loss does not require frame-level alignments, since the context ground truth is obtained from the model's estimated path.
Compared to the same model trained with regular CTC loss, our method consistently improved the ASR performance on both CS and monolingual corpora.
arXiv Detail & Related papers (2020-05-16T09:36:58Z) - GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text
Recognition [27.38969404322089]
We propose the guided training of CTC model, where CTC model learns a better alignment and feature representations from a more powerful attentional guidance.
With the benefit of guided training, CTC model achieves robust and accurate prediction for both regular and irregular scene text.
To further leverage the potential of CTC decoder, a graph convolutional network (GCN) is proposed to learn the local correlations of extracted features.
arXiv Detail & Related papers (2020-02-04T13:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.