Related papers: Shallow-to-Deep Training for Neural Machine Translation

Shallow-to-Deep Training for Neural Machine Translation

URL: http://arxiv.org/abs/2010.03737v1
Date: Thu, 8 Oct 2020 02:36:07 GMT
Title: Shallow-to-Deep Training for Neural Machine Translation
Authors: Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang and Jingbo Zhu
Abstract summary: In this paper, we investigate the behavior of a well-tuned deep Transformer system. We find that stacking layers is helpful in improving the representation ability of NMT models. This inspires us to develop a shallow-to-deep training method that learns deep models by stacking shallow models.
Score: 42.62107851930165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why deep models help NMT is an open question. In this paper, we investigate the behavior of a well-tuned deep Transformer system. We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly. This inspires us to develop a shallow-to-deep training method that learns deep models by stacking shallow models. In this way, we successfully train a Transformer system with a 54-layer encoder. Experimental results on WMT'16 English-German and WMT'14 English-French translation tasks show that it is $1.4$ $\times$ faster than training from scratch, and achieves a BLEU score of $30.33$ and $43.29$ on two tasks. The code is publicly available at https://github.com/libeineu/SDT-Training/.

Related papers

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation [40.72168378706009]
We explore translation models that are universal, efficient, and easy to optimize. We apply large language models (LLMs) to NMT encoding and leave the NMT decoder unchanged. We construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes.
arXiv Detail & Related papers (2025-03-09T12:54:05Z)
The NiuTrans System for WNGT 2020 Efficiency Task [32.88733142090084]
This paper describes the submissions of the NiuTrans Team to the WNGT 2020 Efficiency Shared Task. We focus on the efficient implementation of deep Transformer models using NiuTensor, a flexible toolkit for NLP tasks.
arXiv Detail & Related papers (2021-09-16T14:32:01Z)
Efficient Inference for Multilingual Neural Machine Translation [60.10996883354372]
We consider several ways to make multilingual NMT faster at inference without degrading its quality. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality.
arXiv Detail & Related papers (2021-09-14T13:28:13Z)
Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT) Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
Dynamic Multi-Branch Layers for On-Device Neural Machine Translation [53.637479651600586]
We propose to improve the performance of on-device neural machine translation (NMT) systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference. At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English translation task.
arXiv Detail & Related papers (2021-05-14T07:32:53Z)
Learning Light-Weight Translation Models from Deep Transformer [25.386460662408773]
We propose a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. Our compressed model is 8X shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training.
arXiv Detail & Related papers (2020-12-27T05:33:21Z)
Very Deep Transformers for Neural Machine Translation [100.51465892354234]
We show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU.
arXiv Detail & Related papers (2020-08-18T07:14:54Z)
Norm-Based Curriculum Learning for Neural Machine Translation [45.37588885850862]
A neural machine translation (NMT) system is expensive to train, especially with high-resource settings. In this paper, we aim to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method. The proposed method outperforms strong baselines in terms of BLEU score (+1.17/+1.56) and training speedup (2.22x/3.33x)
arXiv Detail & Related papers (2020-06-03T02:22:00Z)
Multiscale Collaborative Deep Models for Neural Machine Translation [40.52423993051359]
We present a MultiScale Collaborative (MSC) framework to ease the training of NMT models that are substantially deeper than those used previously. We explicitly boost the gradient back-propagation from top to bottom levels by introducing a block-scale collaboration mechanism into deep NMT models. Our deep MSC achieves a BLEU score of 30.56 on WMT14 English-German task that significantly outperforms state-of-the-art deep NMT models.
arXiv Detail & Related papers (2020-04-29T08:36:08Z)
Neural Machine Translation: Challenges, Progress and Future [62.75523637241876]
Machine translation (MT) is a technique that leverages computers to translate human languages automatically. neural machine translation (NMT) models direct mapping between source and target languages with deep neural networks. This article makes a review of NMT framework, discusses the challenges in NMT and introduces some exciting recent progresses.
arXiv Detail & Related papers (2020-04-13T07:53:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.