Related papers: Multilingual Translation via Grafting Pre-trained Language Models

Multilingual Translation via Grafting Pre-trained Language Models

URL: http://arxiv.org/abs/2109.05256v1
Date: Sat, 11 Sep 2021 10:57:45 GMT
Title: Multilingual Translation via Grafting Pre-trained Language Models
Authors: Zewei Sun, Mingxuan Wang and Lei Li
Abstract summary: We propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data.
Score: 12.787188625198459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pre-trained (masked) language models in many NLP tasks. However, directly connecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size.

Related papers

An Efficient Approach for Machine Translation on Low-resource Languages: A Case Study in Vietnamese-Chinese [1.6932009464531739]
We proposed an approach for machine translation in low-resource languages such as Vietnamese-Chinese. Our proposed method leveraged the power of the multilingual pre-trained language model (mBART) and both Vietnamese and Chinese monolingual corpus.
arXiv Detail & Related papers (2025-01-31T17:11:45Z)
Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation [23.401781865904386]
We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English. For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English. For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then bidirectionally train with successive rounds of back-translation.
arXiv Detail & Related papers (2022-09-06T21:20:41Z)
Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing [19.19256927651015]
We describe models that convert monolingual English text into Hinglish (code-mixed Hindi and English) Given the recent success of pretrained language models, we also test the utility of two recent Transformer-based encoder-decoder models. Our models place first in the overall ranking of the English-Hinglish official shared task.
arXiv Detail & Related papers (2021-05-18T19:50:25Z)
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders [74.89326277221072]
How to improve the cross-lingual transfer of NMT model with multilingual pretrained encoder is under-explored. We propose SixT, a simple yet effective model for this task. Our model achieves better performance on many-to-English testsets than CRISS and m2m-100.
arXiv Detail & Related papers (2021-04-18T07:42:45Z)
Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z)
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model. We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext. We show that multilingual translation models can be created through multilingual finetuning. We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation [50.0258495437314]
We investigate the benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT) For BART we get the best performance by freezing most of the model parameters, and adding extra positional embeddings. For mBART we match or outperform the performance of naive fine-tuning for most language pairs with the encoder, and most of the decoder, frozen.
arXiv Detail & Related papers (2020-04-30T16:09:22Z)
Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation. We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.