Related papers: Multi-layer Representation Fusion for Neural Machine Translation

Multi-layer Representation Fusion for Neural Machine Translation

URL: http://arxiv.org/abs/2002.06714v1
Date: Sun, 16 Feb 2020 23:53:07 GMT
Title: Multi-layer Representation Fusion for Neural Machine Translation
Authors: Qiang Wang, Fuxue Li, Tong Xiao, Yanyang Li, Yinqiao Li, Jingbo Zhu
Abstract summary: We propose a multi-layer representation fusion (MLRF) approach to fusing stacked layers. In particular, we design three fusion functions to learn a better representation from the stack. The result is new state-of-the-art in German-English translation.
Score: 38.12309528346962
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural machine translation systems require a number of stacked layers for deep models. But the prediction depends on the sentence representation of the top-most layer with no access to low-level representations. This makes it more difficult to train the model and poses a risk of information loss to prediction. In this paper, we propose a multi-layer representation fusion (MLRF) approach to fusing stacked layers. In particular, we design three fusion functions to learn a better representation from the stack. Experimental results show that our approach yields improvements of 0.92 and 0.56 BLEU points over the strong Transformer baseline on IWSLT German-English and NIST Chinese-English MT tasks respectively. The result is new state-of-the-art in German-English translation.

Related papers

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning [73.73967342609603]
We introduce a predictor-corrector learning framework to minimize truncation errors. We also propose an exponential moving average-based coefficient learning method to strengthen our higher-order predictor. Our model surpasses a robust 3.8B DeepNet by an average of 2.9 SacreBLEU, using only 1/3 parameters.
arXiv Detail & Related papers (2024-11-05T12:26:25Z)
Improving Multilingual Translation by Representation and Gradient Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)
Residual Tree Aggregation of Layers for Neural Machine Translation [11.660776324473645]
We propose a residual tree aggregation of layers for Transformer(RTAL), which helps to fuse information across layers. Specifically, we try to fuse the information across layers by constructing a post-order binary tree. Our model is based on the Neural Machine Translation model Transformer and we conduct experiments on WMT14 English-to-German and WMT17 English-to-France translation tasks.
arXiv Detail & Related papers (2021-07-19T09:32:10Z)
Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation [18.782750537161615]
We propose to share parameters across all layers thereby leading to a recurrently stacked neural network model. We empirically demonstrate that the translation quality of a model that recurrently stacks a single layer 6 times, despite having significantly fewer parameters, approaches that of a model that stacks 6 layers where each layer has different parameters.
arXiv Detail & Related papers (2021-06-18T08:48:01Z)
Deep Transformers with Latent Depth [42.33955275626127]
The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks. We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection. We propose a novel method to train one shared Transformer network for multilingual machine translation.
arXiv Detail & Related papers (2020-09-28T07:13:23Z)
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT. We first represent the input sentence and image using a unified multi-modal graph. We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z)
Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations. In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)
Multiscale Collaborative Deep Models for Neural Machine Translation [40.52423993051359]
We present a MultiScale Collaborative (MSC) framework to ease the training of NMT models that are substantially deeper than those used previously. We explicitly boost the gradient back-propagation from top to bottom levels by introducing a block-scale collaboration mechanism into deep NMT models. Our deep MSC achieves a BLEU score of 30.56 on WMT14 English-German task that significantly outperforms state-of-the-art deep NMT models.
arXiv Detail & Related papers (2020-04-29T08:36:08Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.