Non-Autoregressive Translation with Layer-Wise Prediction and Deep
Supervision
- URL: http://arxiv.org/abs/2110.07515v1
- Date: Thu, 14 Oct 2021 16:36:12 GMT
- Title: Non-Autoregressive Translation with Layer-Wise Prediction and Deep
Supervision
- Authors: Chenyang Huang, Hao Zhou, Osmar R. Za\"iane, Lili Mou, Lei Li
- Abstract summary: Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient.
Recent non-autoregressive translation models speed up the inference, but their quality is still inferior.
We propose DSLP, a highly efficient and high-performance model for machine translation.
- Score: 33.04082398101807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How do we perform efficient inference while retaining high translation
quality? Existing neural machine translation models, such as Transformer,
achieve high performance, but they decode words one by one, which is
inefficient. Recent non-autoregressive translation models speed up the
inference, but their quality is still inferior. In this work, we propose DSLP,
a highly efficient and high-performance model for machine translation. The key
insight is to train a non-autoregressive Transformer with Deep Supervision and
feed additional Layer-wise Predictions. We conducted extensive experiments on
four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO).
Results show that our approach consistently improves the BLEU scores compared
with respective base models. Specifically, our best variant outperforms the
autoregressive model on three translation tasks, while being 14.8 times more
efficient in inference.
Related papers
- Transformers for Low-Resource Languages:Is F\'eidir Linn! [2.648836772989769]
In general, neural translation models often under perform on language pairs with insufficient training data.
We demonstrate that choosing appropriate parameters leads to considerable performance improvements.
A Transformer optimized model demonstrated a BLEU score improvement of 7.8 points when compared with a baseline RNN model.
arXiv Detail & Related papers (2024-03-04T12:29:59Z) - A Paradigm Shift in Machine Translation: Boosting Translation
Performance of Large Language Models [27.777372498182864]
We propose a novel fine-tuning approach for Generative Large Language Models (LLMs)
Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data.
Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance.
arXiv Detail & Related papers (2023-09-20T22:53:15Z) - Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models.
We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z) - On the Pareto Front of Multilingual Neural Machine Translation [123.94355117635293]
We study how the performance of a given direction changes with its sampling ratio in Neural Machine Translation (MNMT)
We propose the Double Power Law to predict the unique performance trade-off front in MNMT.
In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget.
arXiv Detail & Related papers (2023-04-06T16:49:19Z) - Candidate Soups: Fusing Candidate Results Improves Translation Quality
for Non-Autoregressive Translation [15.332496335303189]
Non-autoregressive translation (NAT) model achieves a much faster inference speed than the autoregressive translation (AT) model.
Existing NAT methods only focus on improving the NAT model's performance but do not fully utilize it.
We propose a simple but effective method called "Candidate Soups," which can obtain high-quality translations.
arXiv Detail & Related papers (2023-01-27T02:39:42Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Enriching Non-Autoregressive Transformer with Syntactic and
SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer.
Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z) - Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
Translation [78.51887060865273]
We show that a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed.
Our results establish a new protocol for future research toward fast, accurate machine translation.
arXiv Detail & Related papers (2020-06-18T09:06:49Z) - Multi-layer Representation Fusion for Neural Machine Translation [38.12309528346962]
We propose a multi-layer representation fusion (MLRF) approach to fusing stacked layers.
In particular, we design three fusion functions to learn a better representation from the stack.
The result is new state-of-the-art in German-English translation.
arXiv Detail & Related papers (2020-02-16T23:53:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.