Prompt Learning to Mitigate Catastrophic Forgetting in Cross-lingual
Transfer for Open-domain Dialogue Generation
- URL: http://arxiv.org/abs/2305.07393v2
- Date: Mon, 29 May 2023 19:04:00 GMT
- Title: Prompt Learning to Mitigate Catastrophic Forgetting in Cross-lingual
Transfer for Open-domain Dialogue Generation
- Authors: Lei Liu, Jimmy Xiangji Huang
- Abstract summary: We investigate few-shot cross-lingual transfer learning (FS-XLT) and multitask learning (MTL) in the context of open-domain dialogue generation for non-English languages with limited data.
We observed catastrophic forgetting in both FS-XLT and MTL for all 6 languages in our preliminary experiments.
We propose a simple yet effective prompt learning approach that can preserve the multilinguality of multilingual pre-trained language model.
- Score: 14.68491971816154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue systems for non-English languages have long been under-explored. In
this paper, we take the first step to investigate few-shot cross-lingual
transfer learning (FS-XLT) and multitask learning (MTL) in the context of
open-domain dialogue generation for non-English languages with limited data. We
observed catastrophic forgetting in both FS-XLT and MTL for all 6 languages in
our preliminary experiments. To mitigate the issue, we propose a simple yet
effective prompt learning approach that can preserve the multilinguality of
multilingual pre-trained language model (mPLM) in FS-XLT and MTL by bridging
the gap between pre-training and fine-tuning with Fixed-prompt LM Tuning and
our hand-crafted prompts. Experimental results on all 6 languages in terms of
both automatic and human evaluations demonstrate the effectiveness of our
approach. Our code is available at https://github.com/JeremyLeiLiu/XLinguDial.
Related papers
- Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.
We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.
We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - How do Large Language Models Handle Multilingualism? [81.15060972112563]
This study explores how large language models (LLMs) handle multilingualism.
LLMs initially understand the query, converting multilingual inputs into English for task-solving.
In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures.
arXiv Detail & Related papers (2024-02-29T02:55:26Z) - Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed? [40.13166574854085]
We investigate the minimal amount of multilinguality required to elicit cross-lingual generalisation in English-centric large language models.
We find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-lingual generalisation.
arXiv Detail & Related papers (2023-12-20T00:49:52Z) - How Vocabulary Sharing Facilitates Multilingualism in LLaMA? [19.136382859468693]
Large Language Models (LLMs) often show strong performance on English tasks, while exhibiting limitations on other languages.
This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective.
arXiv Detail & Related papers (2023-11-15T16:13:14Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Multilingual Language Model Adaptive Fine-Tuning: A Study on African
Languages [19.067718464786463]
We perform multilingual adaptive fine-tuning (MAFT) on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent.
To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT.
Our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space.
arXiv Detail & Related papers (2022-04-13T16:13:49Z) - Reusing a Pretrained Language Model on Languages with Limited Corpora
for Unsupervised NMT [129.99918589405675]
We present an effective approach that reuses an LM that is pretrained only on the high-resource language.
The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model.
Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
arXiv Detail & Related papers (2020-09-16T11:37:10Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.