Do all Roads Lead to Rome? Understanding the Role of Initialization in
Iterative Back-Translation
- URL: http://arxiv.org/abs/2002.12867v1
- Date: Fri, 28 Feb 2020 17:05:55 GMT
- Title: Do all Roads Lead to Rome? Understanding the Role of Initialization in
Iterative Back-Translation
- Authors: Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
- Abstract summary: Back-translation is an approach to exploit monolingual corpora in Neural Machine Translation (NMT)
In this paper, we analyze the role that pre-training plays in iterative back-translation.
We show that, although the quality of the initial system does affect final performance, its effect is relatively small.
- Score: 48.26374127723598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Back-translation provides a simple yet effective approach to exploit
monolingual corpora in Neural Machine Translation (NMT). Its iterative variant,
where two opposite NMT models are jointly trained by alternately using a
synthetic parallel corpus generated by the reverse model, plays a central role
in unsupervised machine translation. In order to start producing sound
translations and provide a meaningful training signal to each other, existing
approaches rely on either a separate machine translation system to warm up the
iterative procedure, or some form of pre-training to initialize the weights of
the model. In this paper, we analyze the role that such initialization plays in
iterative back-translation. Is the behavior of the final system heavily
dependent on it? Or does iterative back-translation converge to a similar
solution given any reasonable initialization? Through a series of empirical
experiments over a diverse set of warmup systems, we show that, although the
quality of the initial system does affect final performance, its effect is
relatively small, as iterative back-translation has a strong tendency to
convergence to a similar solution. As such, the margin of improvement left for
the initialization method is narrow, suggesting that future research should
focus more on improving the iterative mechanism itself.
Related papers
- Principled Paraphrase Generation with Parallel Corpora [52.78059089341062]
We formalize the implicit similarity function induced by round-trip Machine Translation.
We show that it is susceptible to non-paraphrase pairs sharing a single ambiguous translation.
We design an alternative similarity metric that mitigates this issue.
arXiv Detail & Related papers (2022-05-24T17:22:42Z) - Understanding and Improving Sequence-to-Sequence Pretraining for Neural
Machine Translation [48.50842995206353]
We study the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT.
We propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies.
arXiv Detail & Related papers (2022-03-16T07:36:28Z) - Unsupervised Neural Machine Translation with Generative Language Models
Only [19.74865387759671]
We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models.
Our method consists of three steps: few-shot amplification, distillation, and backtranslation.
arXiv Detail & Related papers (2021-10-11T17:35:34Z) - Learning Kernel-Smoothed Machine Translation with Retrieved Examples [30.17061384497846]
Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples.
We propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online.
arXiv Detail & Related papers (2021-09-21T06:42:53Z) - Meta Back-translation [111.87397401837286]
We propose a novel method to generate pseudo-parallel data from a pre-trained back-translation model.
Our method is a meta-learning algorithm which adapts a pre-trained back-translation model so that the pseudo-parallel data it generates would train a forward-translation model to do well on a validation set.
arXiv Detail & Related papers (2021-02-15T20:58:32Z) - Modeling Voting for System Combination in Machine Translation [92.09572642019145]
We propose an approach to modeling voting for system combination in machine translation.
Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training.
arXiv Detail & Related papers (2020-07-14T09:59:38Z) - Simplify-then-Translate: Automatic Preprocessing for Black-Box Machine
Translation [5.480070710278571]
We introduce a method to improve black-box machine translation systems via automatic pre-processing (APP) using sentence simplification.
We first propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system.
We show that this preprocessing leads to better translation performance as compared to non-preprocessed source sentences.
arXiv Detail & Related papers (2020-05-22T14:15:53Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.