Unsupervised Neural Machine Translation with Generative Language Models
Only
- URL: http://arxiv.org/abs/2110.05448v1
- Date: Mon, 11 Oct 2021 17:35:34 GMT
- Title: Unsupervised Neural Machine Translation with Generative Language Models
Only
- Authors: Jesse Michael Han, Igor Babuschkin, Harrison Edwards, Arvind
Neelakantan, Tao Xu, Stanislas Polu, Alex Ray, Pranav Shyam, Aditya Ramesh,
Alec Radford, Ilya Sutskever
- Abstract summary: We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models.
Our method consists of three steps: few-shot amplification, distillation, and backtranslation.
- Score: 19.74865387759671
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show how to derive state-of-the-art unsupervised neural machine
translation systems from generatively pre-trained language models. Our method
consists of three steps: few-shot amplification, distillation, and
backtranslation. We first use the zero-shot translation ability of large
pre-trained language models to generate translations for a small set of
unlabeled sentences. We then amplify these zero-shot translations by using them
as few-shot demonstrations for sampling a larger synthetic dataset. This
dataset is distilled by discarding the few-shot demonstrations and then
fine-tuning. During backtranslation, we repeatedly generate translations for a
set of inputs and then fine-tune a single language model on both directions of
the translation task at once, ensuring cycle-consistency by swapping the roles
of gold monotext and generated translations when fine-tuning. By using our
method to leverage GPT-3's zero-shot translation capability, we achieve a new
state-of-the-art in unsupervised translation on the WMT14 English-French
benchmark, attaining a BLEU score of 42.1.
Related papers
- Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - MALM: Mixing Augmented Language Modeling for Zero-Shot Machine
Translation [0.0]
Large pre-trained language models have brought remarkable progress in NLP.
We empirically demonstrate the effectiveness of self-supervised pre-training and data augmentation for zero-shot multi-lingual machine translation.
arXiv Detail & Related papers (2022-10-01T17:01:30Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised
Neural Machine Translation [5.958653653305609]
We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences.
This automatically expands the vocabulary of the model while maintaining high quality content.
arXiv Detail & Related papers (2020-04-05T02:14:14Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.