Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning
- URL: http://arxiv.org/abs/2304.01295v4
- Date: Sat, 27 Jan 2024 03:47:55 GMT
- Title: Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning
- Authors: Lifu Tu, Jin Qu, Semih Yavuz, Shafiq Joty, Wenhao Liu, Caiming Xiong,
Yingbo Zhou
- Abstract summary: Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
- Score: 98.60739735409243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual transfer of language models trained on high-resource languages
like English has been widely studied for many NLP tasks, but focus on
conversational tasks has been rather limited. This is partly due to the high
cost of obtaining non-English conversational data, which results in limited
coverage. In this work, we introduce XSGD for cross-lingual alignment
pretraining, a parallel and large-scale multilingual conversation dataset that
we created by translating the English-only Schema-Guided Dialogue (SGD) dataset
(Rastogi et al., 2020) into 105 other languages. XSGD contains approximately
330k utterances per language. To facilitate aligned cross-lingual
representations, we develop an efficient prompt-tuning-based method for
learning alignment prompts. We also investigate two different classifiers:
NLI-based and vanilla classifiers, and test cross-lingual capability enabled by
the aligned prompts. We evaluate our model's cross-lingual generalization
capabilities on two conversation tasks: slot-filling and intent classification.
Our results demonstrate the strong and efficient modeling ability of NLI-based
classifiers and the large cross-lingual transfer improvements achieved by our
aligned prompts, particularly in few-shot settings. In addition, we highlight
the nice results of our approach compared to LLMs such as text-davinci-003 and
ChatGPT in both zero-shot and few-shot settings. While LLMs exhibit impressive
performance in English, their cross-lingual capabilities in other languages,
particularly low-resource languages, are limited.
Related papers
- Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model [14.39119862985503]
We aim to create a multilingual ALT system with available datasets.
Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario.
We evaluate the performance of the multilingual model in comparison to its monolingual counterparts.
arXiv Detail & Related papers (2024-06-25T15:02:32Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Multilingual Relation Classification via Efficient and Effective
Prompting [9.119073318043952]
We present the first work on prompt-based multilingual relation classification (RC)
We introduce an efficient and effective method that constructs prompts from relation triples and involves only minimal translation for the class labels.
We evaluate its performance in fully supervised, few-shot and zero-shot scenarios, and analyze its effectiveness across 14 languages.
arXiv Detail & Related papers (2022-10-25T08:40:23Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot
Cross-Lingual NLP [68.2650714613869]
We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT.
Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.
arXiv Detail & Related papers (2020-06-11T13:15:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.