On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model
- URL: http://arxiv.org/abs/2311.07820v1
- Date: Tue, 14 Nov 2023 00:43:33 GMT
- Title: On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model
- Authors: Nohil Park, Joonsuk Park, Kang Min Yoo, Sungroh Yoon
- Abstract summary: We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
- Score: 49.81429697921861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An exciting advancement in the field of multilingual models is the emergence
of autoregressive models with zero- and few-shot capabilities, a phenomenon
widely reported in large-scale language models. To further improve model
adaptation to cross-lingual tasks, another trend is to further fine-tune the
language models with either full fine-tuning or parameter-efficient tuning.
However, the interaction between parameter-efficient fine-tuning (PEFT) and
cross-lingual tasks in multilingual autoregressive models has yet to be
studied. Specifically, we lack an understanding of the role of linguistic
distributions in multilingual models in the effectiveness of token-based prompt
tuning. To address this question, we conduct experiments comparing prompt
tuning and fine-tuning on the decoder-based multilingual model, XGLM, with four
cross-lingual tasks (XNLI, PAWS-X, POS, NER). According to our study, prompt
tuning achieves on par or better performance over fine-tuning across all
languages while updating at most 0.13\% of the model parameters. Moreover, we
empirically show that prompt tuning is more effective in enhancing the
performance of low-resource languages than fine-tuning. Our further analysis
shows that the phenomenon is related to the tokenization scheme of the
multilingual model.
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Efficient Compression of Multitask Multilingual Speech Models [0.0]
DistilWhisper is able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
arXiv Detail & Related papers (2024-05-02T03:11:59Z) - MAPLE: Multilingual Evaluation of Parameter Efficient Finetuning of Large Language Models [7.321459642283822]
Finetuning can improve the performance of language models without requiring massive resources and compute.
We finetune LLama-2-7B and Mistral-7B models on two synthetic multilingual instruction tuning datasets to determine its effect on model performance.
We find that PEFT of smaller open-source models sometimes bridges the gap between the performance of these models and the larger ones, however, English performance can take a hit.
arXiv Detail & Related papers (2024-01-15T11:06:43Z) - Improving Massively Multilingual ASR With Auxiliary CTC Objectives [40.10307386370194]
We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark.
We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages.
Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
arXiv Detail & Related papers (2023-02-24T18:59:51Z) - Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual
Understanding With Multilingual Language Models [95.32691891392903]
In this paper, we do cross-lingual evaluation on various NLU tasks using prompt-tuning and compare it with fine-tuning.
The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets.
arXiv Detail & Related papers (2022-10-22T05:48:02Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Multi Task Learning For Zero Shot Performance Prediction of Multilingual
Models [12.759281077118567]
Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages.
We build upon some of the existing techniques for predicting the zero-shot performance on a task, by modeling it as a multi-task learning problem.
arXiv Detail & Related papers (2022-05-12T14:47:03Z) - Evaluating Cross-Lingual Transfer Learning Approaches in Multilingual
Conversational Agent Models [1.52292571922932]
We propose a general multilingual model framework for Natural Language Understanding (NLU) models.
We show that these multilingual models can reach same or better performance compared to monolingual models across language-specific test data.
arXiv Detail & Related papers (2020-12-07T17:14:52Z) - Gradient Vaccine: Investigating and Improving Multi-task Optimization in
Massively Multilingual Models [63.92643612630657]
This paper attempts to peek into the black-box of multilingual optimization through the lens of loss function geometry.
We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with language proximity.
We derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks.
arXiv Detail & Related papers (2020-10-12T17:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.