Multiscale Collaborative Deep Models for Neural Machine Translation
- URL: http://arxiv.org/abs/2004.14021v3
- Date: Mon, 11 May 2020 01:21:22 GMT
- Title: Multiscale Collaborative Deep Models for Neural Machine Translation
- Authors: Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, Weihua Luo
- Abstract summary: We present a MultiScale Collaborative (MSC) framework to ease the training of NMT models that are substantially deeper than those used previously.
We explicitly boost the gradient back-propagation from top to bottom levels by introducing a block-scale collaboration mechanism into deep NMT models.
Our deep MSC achieves a BLEU score of 30.56 on WMT14 English-German task that significantly outperforms state-of-the-art deep NMT models.
- Score: 40.52423993051359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent evidence reveals that Neural Machine Translation (NMT) models with
deeper neural networks can be more effective but are difficult to train. In
this paper, we present a MultiScale Collaborative (MSC) framework to ease the
training of NMT models that are substantially deeper than those used
previously. We explicitly boost the gradient back-propagation from top to
bottom levels by introducing a block-scale collaboration mechanism into deep
NMT models. Then, instead of forcing the whole encoder stack directly learns a
desired representation, we let each encoder block learns a fine-grained
representation and enhance it by encoding spatial dependencies using a
context-scale collaboration. We provide empirical evidence showing that the MSC
nets are easy to optimize and can obtain improvements of translation quality
from considerably increased depth. On IWSLT translation tasks with three
translation directions, our extremely deep models (with 72-layer encoders)
surpass strong baselines by +2.2~+3.1 BLEU points. In addition, our deep MSC
achieves a BLEU score of 30.56 on WMT14 English-German task that significantly
outperforms state-of-the-art deep NMT models.
Related papers
- Improving Neural Machine Translation by Multi-Knowledge Integration with
Prompting [36.24578487904221]
We focus on how to integrate multi-knowledge, multiple types of knowledge, into NMT models to enhance the performance with prompting.
We propose a unified framework, which can integrate effectively multiple types of knowledge including sentences, terminologies/phrases and translation templates into NMT models.
arXiv Detail & Related papers (2023-12-08T02:55:00Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Self-supervised and Supervised Joint Training for Resource-rich Machine
Translation [30.502625878505732]
Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose a joint training approach, $F$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models.
arXiv Detail & Related papers (2021-06-08T02:35:40Z) - Shallow-to-Deep Training for Neural Machine Translation [42.62107851930165]
In this paper, we investigate the behavior of a well-tuned deep Transformer system.
We find that stacking layers is helpful in improving the representation ability of NMT models.
This inspires us to develop a shallow-to-deep training method that learns deep models by stacking shallow models.
arXiv Detail & Related papers (2020-10-08T02:36:07Z) - Very Deep Transformers for Neural Machine Translation [100.51465892354234]
We show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers.
These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU.
arXiv Detail & Related papers (2020-08-18T07:14:54Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z) - Multi-layer Representation Fusion for Neural Machine Translation [38.12309528346962]
We propose a multi-layer representation fusion (MLRF) approach to fusing stacked layers.
In particular, we design three fusion functions to learn a better representation from the stack.
The result is new state-of-the-art in German-English translation.
arXiv Detail & Related papers (2020-02-16T23:53:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.