MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better
Translators
- URL: http://arxiv.org/abs/2110.06609v1
- Date: Wed, 13 Oct 2021 10:06:21 GMT
- Title: MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better
Translators
- Authors: Zhixing Tan, Xiangwen Zhang, Shuo Wang, Yang Liu
- Abstract summary: We present Multi-Stage Prompting, a simple and lightweight approach for better adapting pre-trained language models to translation tasks.
To make pre-trained language models better translators, we divide the translation process via pre-trained language models into three separate stages.
During each stage, we independently apply different continuous prompts for allowing pre-trained language models better adapting to translation tasks.
- Score: 10.557167523009392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models have recently been shown to be able to perform
translation without finetuning via prompting. Inspired by these findings, we
study improving the performance of pre-trained language models on translation
tasks, where training neural machine translation models is the current de facto
approach. We present Multi-Stage Prompting, a simple and lightweight approach
for better adapting pre-trained language models to translation tasks. To make
pre-trained language models better translators, we divide the translation
process via pre-trained language models into three separate stages: the
encoding stage, the re-encoding stage, and the decoding stage. During each
stage, we independently apply different continuous prompts for allowing
pre-trained language models better adapting to translation tasks. We conduct
extensive experiments on low-, medium-, and high-resource translation tasks.
Experiments show that our method can significantly improve the translation
performance of pre-trained language models.
Related papers
- XDLM: Cross-lingual Diffusion Language Model for Machine Translation [0.0]
We propose a novel Cross-lingual diffusion model for machine translation, consisting of pretraining and fine-tuning stages.
We evaluate the result on several machine translation benchmarks and outperformed both diffusion and Transformer baselines.
arXiv Detail & Related papers (2023-07-25T15:08:34Z) - Extending the Subwording Model of Multilingual Pretrained Models for New
Languages [31.702393348980735]
In this paper, we add new subwords to the SentencePiece tokenizer to apply a multilingual pretrained model to new languages.
In our experiments, we segmented Inuktitut sentences into subwords without changing the segmentation of already pretrained languages.
arXiv Detail & Related papers (2022-11-29T06:55:34Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z) - Testing pre-trained Transformer models for Lithuanian news clustering [0.0]
Non-English languages could not leverage such new opportunities with the English text pre-trained models.
We compare pre-trained multilingual BERT, XLM-R, and older learned text representation methods as encodings for the task of Lithuanian news clustering.
Our results indicate that publicly available pre-trained multilingual Transformer models can be fine-tuned to surpass word vectors but still score much lower than specially trained doc2vec embeddings.
arXiv Detail & Related papers (2020-04-03T14:41:54Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.