Rethinking Zero-shot Neural Machine Translation: From a Perspective of
Latent Variables
- URL: http://arxiv.org/abs/2109.04705v1
- Date: Fri, 10 Sep 2021 07:18:53 GMT
- Title: Rethinking Zero-shot Neural Machine Translation: From a Perspective of
Latent Variables
- Authors: Weizhi Wang, Zhirui Zhang, Yichao Du, Boxing Chen, Jun Xie, Weihua Luo
- Abstract summary: We introduce a denoising autoencoder objective based on pivot language into traditional training objective to improve the translation accuracy on zero-shot directions.
We demonstrate that the proposed method is able to effectively eliminate the spurious correlations and significantly outperforms state-of-the-art methods with a remarkable performance.
- Score: 28.101782382170306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot translation, directly translating between language pairs unseen in
training, is a promising capability of multilingual neural machine translation
(NMT). However, it usually suffers from capturing spurious correlations between
the output language and language invariant semantics due to the maximum
likelihood training objective, leading to poor transfer performance on
zero-shot translation. In this paper, we introduce a denoising autoencoder
objective based on pivot language into traditional training objective to
improve the translation accuracy on zero-shot directions. The theoretical
analysis from the perspective of latent variables shows that our approach
actually implicitly maximizes the probability distributions for zero-shot
directions. On two benchmark machine translation datasets, we demonstrate that
the proposed method is able to effectively eliminate the spurious correlations
and significantly outperforms state-of-the-art methods with a remarkable
performance. Our code is available at https://github.com/Victorwz/zs-nmt-dae.
Related papers
- Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual
Translation of Dravidian Languages [0.34998703934432673]
We build a single-decoder neural machine translation system for Dravidian-Dravidian multilingual translation.
Our model achieves scores within 3 BLEU of large-scale pivot-based models when it is trained on 50% of the language directions.
arXiv Detail & Related papers (2023-08-10T13:38:09Z) - Understanding and Mitigating the Uncertainty in Zero-Shot Translation [92.25357943169601]
We aim to understand and alleviate the off-target issues from the perspective of uncertainty in zero-shot translation.
We propose two lightweight and complementary approaches to denoise the training data for model training.
Our approaches significantly improve the performance of zero-shot translation over strong MNMT baselines.
arXiv Detail & Related papers (2022-05-20T10:29:46Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Improving Zero-Shot Translation by Disentangling Positional Information [24.02434897109097]
We show that a main factor causing the language-specific representations is the positional correspondence to input tokens.
We gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions.
arXiv Detail & Related papers (2020-12-30T12:20:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.