Investigating the Reordering Capability in CTC-based Non-Autoregressive
End-to-End Speech Translation
- URL: http://arxiv.org/abs/2105.04840v1
- Date: Tue, 11 May 2021 07:48:45 GMT
- Title: Investigating the Reordering Capability in CTC-based Non-Autoregressive
End-to-End Speech Translation
- Authors: Shun-Po Chuang, Yung-Sung Chuang, Chih-Chiang Chang, Hung-yi Lee
- Abstract summary: We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC)
CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability.
Our analysis shows that transformer encoders have the ability to change the word order.
- Score: 62.943925893616196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the possibilities of building a non-autoregressive speech-to-text
translation model using connectionist temporal classification (CTC), and use
CTC-based automatic speech recognition as an auxiliary task to improve the
performance. CTC's success on translation is counter-intuitive due to its
monotonicity assumption, so we analyze its reordering capability. Kendall's tau
distance is introduced as the quantitative metric, and gradient-based
visualization provides an intuitive way to take a closer look into the model.
Our analysis shows that transformer encoders have the ability to change the
word order and points out the future research direction that worth being
explored more on non-autoregressive speech translation.
Related papers
- Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Markovian Transformers for Informative Language Modeling [0.9642500063568188]
Chain-of-Thought (CoT) reasoning holds great promise for explaining the outputs of language models.
Recent studies have highlighted significant challenges in its practical application for interpretability.
We propose a technique to factor next-token prediction through intermediate CoT text, ensuring the CoT is causally load-bearing.
arXiv Detail & Related papers (2024-04-29T17:36:58Z) - Unimodal Aggregation for CTC-based Speech Recognition [7.6112706449833505]
A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token.
UMA learns better feature representations and shortens the sequence length, resulting in lower recognition error and computational complexity.
arXiv Detail & Related papers (2023-09-15T04:34:40Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - A CTC Alignment-based Non-autoregressive Transformer for End-to-end
Automatic Speech Recognition [26.79184118279807]
We present a CTC Alignment-based Single-Step Non-Autoregressive Transformer (CASS-NAT) for end-to-end ASR.
word embeddings in the autoregressive transformer (AT) are substituted with token-level acoustic embeddings (TAE) that are extracted from encoder outputs.
We find that CASS-NAT has a WER that is close to AT on various ASR tasks, while providing a 24x inference speedup.
arXiv Detail & Related papers (2023-04-15T18:34:29Z) - CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework.
Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.