Related papers: Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

URL: http://arxiv.org/abs/2105.04840v1
Date: Tue, 11 May 2021 07:48:45 GMT
Title: Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation
Authors: Shun-Po Chuang, Yung-Sung Chuang, Chih-Chiang Chang, Hung-yi Lee
Abstract summary: We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC) CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability. Our analysis shows that transformer encoders have the ability to change the word order.
Score: 62.943925893616196
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance. CTC's success on translation is counter-intuitive due to its monotonicity assumption, so we analyze its reordering capability. Kendall's tau distance is introduced as the quantitative metric, and gradient-based visualization provides an intuitive way to take a closer look into the model. Our analysis shows that transformer encoders have the ability to change the word order and points out the future research direction that worth being explored more on non-autoregressive speech translation.

Related papers

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z)
Markovian Transformers for Informative Language Modeling [0.9642500063568188]
Chain-of-Thought (CoT) reasoning holds great promise for explaining the outputs of language models. Recent studies have highlighted significant challenges in its practical application for interpretability. We propose a technique to factor next-token prediction through intermediate CoT text, ensuring the CoT is causally load-bearing.
arXiv Detail & Related papers (2024-04-29T17:36:58Z)
Unimodal Aggregation for CTC-based Speech Recognition [7.6112706449833505]
A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token. UMA learns better feature representations and shortens the sequence length, resulting in lower recognition error and computational complexity.
arXiv Detail & Related papers (2023-09-15T04:34:40Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation. We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts. Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z)
A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition [26.79184118279807]
We present a CTC Alignment-based Single-Step Non-Autoregressive Transformer (CASS-NAT) for end-to-end ASR. word embeddings in the autoregressive transformer (AT) are substituted with token-level acoustic embeddings (TAE) that are extracted from encoder outputs. We find that CASS-NAT has a WER that is close to AT on various ASR tasks, while providing a 24x inference speedup.
arXiv Detail & Related papers (2023-04-15T18:34:29Z)
CTC Alignments Improve Autoregressive Translation [145.90587287444976]
We argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework. Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation tasks.
arXiv Detail & Related papers (2022-10-11T07:13:50Z)
Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations. In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.