Related papers: Hybrid-Regressive Neural Machine Translation

Hybrid-Regressive Neural Machine Translation

URL: http://arxiv.org/abs/2210.10416v1
Date: Wed, 19 Oct 2022 09:26:15 GMT
Title: Hybrid-Regressive Neural Machine Translation
Authors: Qiang Wang, Xinhui Hu, Ming Chen
Abstract summary: We investigate how to combine the strengths of autoregressive and non-autoregressive translation paradigms better. We propose a new two-stage translation prototype called hybrid-regressive translation (HRT) HRT achieves the state-of-the-art BLEU score of 28.49 on the WMT En-De task and is at least 1.5x faster than AT, regardless of batch size and device.
Score: 11.634586560239404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we empirically confirm that non-autoregressive translation with an iterative refinement mechanism (IR-NAT) suffers from poor acceleration robustness because it is more sensitive to decoding batch size and computing device setting than autoregressive translation (AT). Inspired by it, we attempt to investigate how to combine the strengths of autoregressive and non-autoregressive translation paradigms better. To this end, we demonstrate through synthetic experiments that prompting a small number of AT's predictions can promote one-shot non-autoregressive translation to achieve the equivalent performance of IR-NAT. Following this line, we propose a new two-stage translation prototype called hybrid-regressive translation (HRT). Specifically, HRT first generates discontinuous sequences via autoregression (e.g., make a prediction every k tokens, k>1) and then fills in all previously skipped tokens at once in a non-autoregressive manner. We also propose a bag of techniques to effectively and efficiently train HRT without adding any model parameters. HRT achieves the state-of-the-art BLEU score of 28.49 on the WMT En-De task and is at least 1.5x faster than AT, regardless of batch size and device. In addition, another bonus of HRT is that it successfully inherits the good characteristics of AT in the deep-encoder-shallow-decoder architecture. Concretely, compared to the vanilla HRT with a 6-layer encoder and 6-layer decoder, the inference speed of HRT with a 12-layer encoder and 1-layer decoder is further doubled on both GPU and CPU without BLEU loss.

Related papers

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR [17.950722198543897]
We present textbfHybrid-textbfAutoregressive textbfINference TrtextbfANsducers (HAINAN), a novel architecture for speech recognition. HAINAN supports both autoregressive inference with all network components and non-autoregressive inference without the predictor.
arXiv Detail & Related papers (2024-10-03T15:38:20Z)
CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation. We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts. Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z)
The RoyalFlush System for the WMT 2022 Efficiency Task [11.00644143928471]
This paper describes the submission of the Royal neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation. Our fastest system reaches 6k+ words/second on the GPU latency setting, estimated to be about 3.1x faster than the last year's winner.
arXiv Detail & Related papers (2022-12-03T05:36:10Z)
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer. It accurately predicts the number of output tokens and extract hidden variables. It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z)
Non-Autoregressive Neural Machine Translation: A Call for Clarity [3.1447111126465]
We take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models. We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants. We contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks.
arXiv Detail & Related papers (2022-05-21T12:15:22Z)
Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision [33.04082398101807]
Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. We propose DSLP, a highly efficient and high-performance model for machine translation.
arXiv Detail & Related papers (2021-10-14T16:36:12Z)
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input [54.82369261350497]
We propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module. Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.
arXiv Detail & Related papers (2020-10-28T15:00:09Z)
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation [68.25872110275542]
We propose an efficient inference procedure for non-autoregressive machine translation. It iteratively refines translation purely in the continuous space. We evaluate our approach on WMT'14 En-De, WMT'16 Ro-En and IWSLT'16 De-En.
arXiv Detail & Related papers (2020-09-15T15:30:14Z)
Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z)
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation [78.51887060865273]
We show that a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed. Our results establish a new protocol for future research toward fast, accurate machine translation.
arXiv Detail & Related papers (2020-06-18T09:06:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.