Recipes for Adapting Pre-trained Monolingual and Multilingual Models to
Machine Translation
- URL: http://arxiv.org/abs/2004.14911v2
- Date: Mon, 20 Jun 2022 18:49:47 GMT
- Title: Recipes for Adapting Pre-trained Monolingual and Multilingual Models to
Machine Translation
- Authors: Asa Cooper Stickland, Xian Li, Marjan Ghazvininejad
- Abstract summary: We investigate the benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT)
For BART we get the best performance by freezing most of the model parameters, and adding extra positional embeddings.
For mBART we match or outperform the performance of naive fine-tuning for most language pairs with the encoder, and most of the decoder, frozen.
- Score: 50.0258495437314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been recent success in pre-training on monolingual data and
fine-tuning on Machine Translation (MT), but it remains unclear how to best
leverage a pre-trained model for a given MT task. This paper investigates the
benefits and drawbacks of freezing parameters, and adding new ones, when
fine-tuning a pre-trained model on MT. We focus on 1) Fine-tuning a model
trained only on English monolingual data, BART. 2) Fine-tuning a model trained
on monolingual data from 25 languages, mBART. For BART we get the best
performance by freezing most of the model parameters, and adding extra
positional embeddings. For mBART we match or outperform the performance of
naive fine-tuning for most language pairs with the encoder, and most of the
decoder, frozen. The encoder-decoder attention parameters are most important to
fine-tune. When constraining ourselves to an out-of-domain training set for
Vietnamese to English we see the largest improvements over the fine-tuning
baseline.
Related papers
- Universal Conditional Masked Language Pre-training for Neural Machine
Translation [29.334361879066602]
We propose CeMAT, a conditional masked language model pre-trained on large-scale bilingual and monolingual corpora.
We conduct extensive experiments and show that our CeMAT can achieve significant performance improvement for all scenarios.
arXiv Detail & Related papers (2022-03-17T10:00:33Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - Multilingual Translation via Grafting Pre-trained Language Models [12.787188625198459]
We propose Graformer to graft separately pre-trained (masked) language models for machine translation.
With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data.
arXiv Detail & Related papers (2021-09-11T10:57:45Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Multilingual Speech Translation with Efficient Finetuning of Pretrained
Models [82.22294901727933]
A minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability.
Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model.
arXiv Detail & Related papers (2020-10-24T08:15:08Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.