CTC-based Non-autoregressive Speech Translation
- URL: http://arxiv.org/abs/2305.17358v1
- Date: Sat, 27 May 2023 03:54:09 GMT
- Title: CTC-based Non-autoregressive Speech Translation
- Authors: Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun
Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma and Jingbo
Zhu
- Abstract summary: We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
- Score: 51.37920141751813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Combining end-to-end speech translation (ST) and non-autoregressive (NAR)
generation is promising in language and speech processing for their advantages
of less error propagation and low latency. In this paper, we investigate the
potential of connectionist temporal classification (CTC) for non-autoregressive
speech translation (NAST). In particular, we develop a model consisting of two
encoders that are guided by CTC to predict the source and target texts,
respectively. Introducing CTC into NAST on both language sides has obvious
challenges: 1) the conditional independent generation somewhat breaks the
interdependency among tokens, and 2) the monotonic alignment assumption in
standard CTC does not hold in translation tasks. In response, we develop a
prediction-aware encoding approach and a cross-layer attention approach to
address these issues. We also use curriculum learning to improve convergence of
training. Experiments on the MuST-C ST benchmarks show that our NAST model
achieves an average BLEU score of 29.5 with a speed-up of 5.67$\times$, which
is comparable to the autoregressive counterpart and even outperforms the
previous best result of 0.9 BLEU points.
Related papers
- CTC-based Non-autoregressive Textless Speech-to-Speech Translation [38.99922762754443]
Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding.
Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly.
In this paper, we investigate the performance of CTC-based NAR models in S2ST, as these models have shown impressive results in machine translation.
arXiv Detail & Related papers (2024-06-11T15:00:33Z) - Markovian Transformers for Informative Language Modeling [0.9642500063568188]
Chain-of-Thought (CoT) reasoning holds great promise for explaining the outputs of language models.
Recent studies have highlighted significant challenges in its practical application for interpretability.
We propose a technique to factor next-token prediction through intermediate CoT text, ensuring the CoT is causally load-bearing.
arXiv Detail & Related papers (2024-04-29T17:36:58Z) - Bridging the Gaps of Both Modality and Language: Synchronous Bilingual
CTC for Speech Translation and Speech Recognition [46.41096278421193]
BiL-CTC+ bridges the gap between audio and text as well as between source and target languages.
Our method also yields significant improvements in speech recognition performance.
arXiv Detail & Related papers (2023-09-21T16:28:42Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - Non-Autoregressive Neural Machine Translation: A Call for Clarity [3.1447111126465]
We take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models.
We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants.
We contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks.
arXiv Detail & Related papers (2022-05-21T12:15:22Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - Investigating the Reordering Capability in CTC-based Non-Autoregressive
End-to-End Speech Translation [62.943925893616196]
We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC)
CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability.
Our analysis shows that transformer encoders have the ability to change the word order.
arXiv Detail & Related papers (2021-05-11T07:48:45Z) - Orthros: Non-autoregressive End-to-end Speech Translation with
Dual-decoder [64.55176104620848]
We propose a novel NAR E2E-ST framework, Orthros, in which both NAR and autoregressive (AR) decoders are jointly trained on the shared speech encoder.
The latter is used for selecting better translation among various length candidates generated from the former, which dramatically improves the effectiveness of a large length beam with negligible overhead.
Experiments on four benchmarks show the effectiveness of the proposed method in improving inference speed while maintaining competitive translation quality.
arXiv Detail & Related papers (2020-10-25T06:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.