CTC Alignments Improve Autoregressive Translation
- URL: http://arxiv.org/abs/2210.05200v1
- Date: Tue, 11 Oct 2022 07:13:50 GMT
- Title: CTC Alignments Improve Autoregressive Translation
- Authors: Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian
Metze, Alan W Black, Shinji Watanabe
- Abstract summary: We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
- Score: 145.90587287444976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Connectionist Temporal Classification (CTC) is a widely used approach for
automatic speech recognition (ASR) that performs conditionally independent
monotonic alignment. However for translation, CTC exhibits clear limitations
due to the contextual and non-monotonic nature of the task and thus lags behind
attentional decoder approaches in terms of translation quality. In this work,
we argue that CTC does in fact make sense for translation if applied in a joint
CTC/attention framework wherein CTC's core properties can counteract several
key weaknesses of pure-attention models during training and decoding. To
validate this conjecture, we modify the Hybrid CTC/Attention model originally
proposed for ASR to support text-to-text translation (MT) and speech-to-text
translation (ST). Our proposed joint CTC/attention models outperform
pure-attention baselines across six benchmark translation tasks.
Related papers
- CR-CTC: Consistency regularization on CTC for improved speech recognition [18.996929774821822]
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR)
However, it often falls short in recognition performance compared to transducer or systems combining CTC and attention-based encoder-decoder (CTC/AED)
We propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram.
arXiv Detail & Related papers (2024-10-07T14:56:07Z) - Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - Efficient CTC Regularization via Coarse Labels for End-to-End Speech
Translation [48.203394370942505]
We re-examine the need for genuine vocabulary labels for Connectionist Temporal Classification (CTC) for regularization.
We propose coarse labeling for CTC, which merges vocabulary labels via simple rules, such as using truncation, division or modulo (MOD) operations.
We show that CoLaCTC successfully generalizes to CTC regularization regardless of using transcript or translation for labeling.
arXiv Detail & Related papers (2023-02-21T18:38:41Z) - BERT Meets CTC: New Formulation of End-to-End Speech Recognition with
Pre-trained Masked Language Model [40.16332045057132]
BERT-CTC is a novel formulation of end-to-end speech recognition.
It incorporates linguistic knowledge through the explicit output dependency obtained by BERT contextual embedding.
BERT-CTC improves over conventional approaches across variations in speaking styles and languages.
arXiv Detail & Related papers (2022-10-29T18:19:44Z) - Investigating the Reordering Capability in CTC-based Non-Autoregressive
End-to-End Speech Translation [62.943925893616196]
We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC)
CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability.
Our analysis shows that transformer encoders have the ability to change the word order.
arXiv Detail & Related papers (2021-05-11T07:48:45Z) - Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z) - Reducing Spelling Inconsistencies in Code-Switching ASR using
Contextualized CTC Loss [5.707652271634435]
We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies.
CCTC loss does not require frame-level alignments, since the context ground truth is obtained from the model's estimated path.
Compared to the same model trained with regular CTC loss, our method consistently improved the ASR performance on both CS and monolingual corpora.
arXiv Detail & Related papers (2020-05-16T09:36:58Z) - CTC-synchronous Training for Monotonic Attention Model [43.0382262234792]
backward probabilities cannot be leveraged in the alignment process during training due to left-to-right dependency in the decoder.
We propose CTC-synchronous training ( CTC-ST), in which MoChA uses CTC alignments to learn optimal monotonic alignments.
The entire model is jointly optimized so that the expected boundaries from MoChA are synchronized with the alignments.
arXiv Detail & Related papers (2020-05-10T16:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.