Improving Multilingual Translation by Representation and Gradient
Regularization
- URL: http://arxiv.org/abs/2109.04778v1
- Date: Fri, 10 Sep 2021 10:52:21 GMT
- Title: Improving Multilingual Translation by Representation and Gradient
Regularization
- Authors: Yilin Yang, Akiko Eriguchi, Alexandre Muzio, Prasad Tadepalli, Stefan
Lee and Hany Hassan
- Abstract summary: We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
- Score: 82.42760103045083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual Neural Machine Translation (NMT) enables one model to serve all
translation directions, including ones that are unseen during training, i.e.
zero-shot translation. Despite being theoretically attractive, current models
often produce low quality translations -- commonly failing to even produce
outputs in the right target language. In this work, we observe that off-target
translation is dominant even in strong multilingual systems, trained on massive
multilingual corpora. To address this issue, we propose a joint approach to
regularize NMT models at both representation-level and gradient-level. At the
representation level, we leverage an auxiliary target language prediction task
to regularize decoder outputs to retain information about the target language.
At the gradient level, we leverage a small amount of direct data (in thousands
of sentence pairs) to regularize model gradients. Our results demonstrate that
our approach is highly effective in both reducing off-target translation
occurrences and improving zero-shot translation performance by +5.59 and +10.38
BLEU on WMT and OPUS datasets respectively. Moreover, experiments show that our
method also works well when the small amount of direct data is not available.
Related papers
- On the Pareto Front of Multilingual Neural Machine Translation [123.94355117635293]
We study how the performance of a given direction changes with its sampling ratio in Neural Machine Translation (MNMT)
We propose the Double Power Law to predict the unique performance trade-off front in MNMT.
In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget.
arXiv Detail & Related papers (2023-04-06T16:49:19Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Improving Zero-Shot Translation by Disentangling Positional Information [24.02434897109097]
We show that a main factor causing the language-specific representations is the positional correspondence to input tokens.
We gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions.
arXiv Detail & Related papers (2020-12-30T12:20:41Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.