Parameter-Efficient Finetuning for Robust Continual Multilingual
Learning
- URL: http://arxiv.org/abs/2209.06767v3
- Date: Mon, 28 Aug 2023 17:59:49 GMT
- Title: Parameter-Efficient Finetuning for Robust Continual Multilingual
Learning
- Authors: Kartikeya Badola, Shachi Dave, Partha Talukdar
- Abstract summary: We introduce and study the problem of Continual Multilingual Learning (CML)
A previously trained multilingual model is periodically updated using new data arriving in stages.
If the new data is present only in a subset of languages, we find that the resulting model shows improved performance only on the languages included in the latest update while its performance on all the remaining languages degrade significantly.
We propose LAFT-URIEL, a parameter-efficient finetuning strategy which aims to increase the number of languages on which the model improves after an update, while reducing the magnitude of loss in performance for the remaining languages.
- Score: 15.823345795987237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce and study the problem of Continual Multilingual Learning (CML)
where a previously trained multilingual model is periodically updated using new
data arriving in stages. If the new data is present only in a subset of
languages, we find that the resulting model shows improved performance only on
the languages included in the latest update (and a few closely related
languages) while its performance on all the remaining languages degrade
significantly. We address this challenge by proposing LAFT-URIEL, a
parameter-efficient finetuning strategy which aims to increase the number of
languages on which the model improves after an update, while reducing the
magnitude of loss in performance for the remaining languages. LAFT-URIEL uses
linguistic knowledge to balance overfitting and knowledge sharing across
languages, allowing for an additional 25% of task languages to see an
improvement in performance after an update, while also reducing the average
magnitude of losses on the remaining languages by 78% relative.
Related papers
- LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language [2.9914612342004503]
This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages.
We assess various strategies, including continual training, instruction fine-tuning, task-specific fine-tuning, and vocabulary extension.
The results show that continual training improves language comprehension, as reflected in perplexity scores, and task-specific tuning generally enhances performance of downstream tasks.
arXiv Detail & Related papers (2024-05-13T13:41:59Z) - No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Enhancing Multilingual Capabilities of Large Language Models through
Self-Distillation from Resource-Rich Languages [60.162717568496355]
Large language models (LLMs) have been pre-trained on multilingual corpora.
Their performance still lags behind in most languages compared to a few resource-rich languages.
arXiv Detail & Related papers (2024-02-19T15:07:32Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting [50.24676567971536]
The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages.
We apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages.
We conclude that with sufficient training data language adaptation can generalize well to diverse languages.
arXiv Detail & Related papers (2022-12-19T15:24:45Z) - Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data.
Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - When Being Unseen from mBERT is just the Beginning: Handling New
Languages With Multilingual Language Models [2.457872341625575]
Transfer learning based on pretraining language models on a large amount of raw data has become a new norm to reach state-of-the-art performance in NLP.
We show that such models behave in multiple ways on unseen languages.
arXiv Detail & Related papers (2020-10-24T10:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.