A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
Generation
- URL: http://arxiv.org/abs/2110.05249v1
- Date: Mon, 11 Oct 2021 13:05:06 GMT
- Title: A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
Generation
- Authors: Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya
Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe
- Abstract summary: Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
We conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR)
The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances.
- Score: 59.64193903397301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive (NAR) models simultaneously generate multiple outputs in a
sequence, which significantly reduces the inference speed at the cost of
accuracy drop compared to autoregressive baselines. Showing great potential for
real-time applications, an increasing number of NAR models have been explored
in different fields to mitigate the performance gap against AR models. In this
work, we conduct a comparative study of various NAR modeling methods for
end-to-end automatic speech recognition (ASR). Experiments are performed in the
state-of-the-art setting using ESPnet. The results on various tasks provide
interesting findings for developing an understanding of NAR ASR, such as the
accuracy-speed trade-off and robustness against long-form utterances. We also
show that the techniques can be combined for further improvement and applied to
NAR end-to-end speech translation. All the implementations are publicly
available to encourage further research in NAR speech processing.
Related papers
- Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation [15.632419297059993]
Non-autoregressive (NAR) language models are known for their low latency in neural machine translation (NMT)
A performance gap exists between NAR and autoregressive models due to the large decoding space and difficulty capturing independency between target words accurately.
We apply reinforcement learning (RL) to Levenshtein Transformer, a representative edit-based NAR model, demonstrating that RL with self-generated data can enhance the performance of edit-based NAR models.
arXiv Detail & Related papers (2024-05-02T13:39:28Z) - Leveraging Diverse Modeling Contexts with Collaborating Learning for
Neural Machine Translation [26.823126615724888]
Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT)
We propose a novel generic collaborative learning method, DCMCL, where AR and NAR models are treated as collaborators instead of teachers and students.
arXiv Detail & Related papers (2024-02-28T15:55:02Z) - Non-Autoregressive Machine Translation: It's Not as Fast as it Seems [84.47091735503979]
We point out flaws in the evaluation methodology present in the literature on NAR models.
We compare NAR models with other widely used methods for improving efficiency.
We call for more realistic and extensive evaluation of NAR models in future work.
arXiv Detail & Related papers (2022-05-04T09:30:17Z) - A Survey on Non-Autoregressive Generation for Neural Machine Translation
and Beyond [145.43029264191543]
Non-autoregressive (NAR) generation is first proposed in machine translation (NMT) to speed up inference.
While NAR generation can significantly accelerate machine translation, the inference of autoregressive (AR) generation sacrificed translation accuracy.
Many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation.
arXiv Detail & Related papers (2022-04-20T07:25:22Z) - SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition [49.42625022146008]
We present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks.
Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.
arXiv Detail & Related papers (2021-10-11T19:23:50Z) - TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech
Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step.
To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT)
The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z) - Improving Non-autoregressive Neural Machine Translation with Monolingual
Data [13.43438045177293]
Non-autoregressive (NAR) neural machine translation is usually done via knowledge distillation from an autoregressive (AR) model.
We leverage large monolingual corpora to improve the NAR model's performance.
arXiv Detail & Related papers (2020-05-02T22:24:52Z) - A Study of Non-autoregressive Model for Sequence Generation [147.89525760170923]
Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel.
We propose knowledge distillation and source-target alignment to bridge the gap between AR and NAR models.
arXiv Detail & Related papers (2020-04-22T09:16:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.