Overcoming Catastrophic Forgetting in Massively Multilingual Continual
Learning
- URL: http://arxiv.org/abs/2305.16252v1
- Date: Thu, 25 May 2023 17:06:34 GMT
- Title: Overcoming Catastrophic Forgetting in Massively Multilingual Continual
Learning
- Authors: Genta Indra Winata, Lingjue Xie, Karthik Radhakrishnan, Shijie Wu,
Xisen Jin, Pengxiang Cheng, Mayank Kulkarni, Daniel Preotiuc-Pietro
- Abstract summary: We study catastrophic forgetting, as well as methods to minimize this, in a massively multilingual continual learning framework involving up to 51 languages.
We present LR ADJUST, a learning rate scheduling method that is simple, yet effective in preserving new information without strongly overwriting past knowledge.
- Score: 34.034825754625935
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Real-life multilingual systems should be able to efficiently incorporate new
languages as data distributions fed to the system evolve and shift over time.
To do this, systems need to handle the issue of catastrophic forgetting, where
the model performance drops for languages or tasks seen further in its past. In
this paper, we study catastrophic forgetting, as well as methods to minimize
this, in a massively multilingual continual learning framework involving up to
51 languages and covering both classification and sequence labeling tasks. We
present LR ADJUST, a learning rate scheduling method that is simple, yet
effective in preserving new information without strongly overwriting past
knowledge. Furthermore, we show that this method is effective across multiple
continual learning approaches. Finally, we provide further insights into the
dynamics of catastrophic forgetting in this massively multilingual setup.
Related papers
- No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Orthogonal Subspace Learning for Language Model Continual Learning [45.35861158925975]
O-LoRA is a simple and efficient approach for continual learning in language models.
Our method induces only marginal additional parameter costs and requires no user data storage for replay.
arXiv Detail & Related papers (2023-10-22T02:23:44Z) - Towards continually learning new languages [66.36852845415916]
batch-learning of languages can be economically beneficial, but the main challenge is catastrophic forgetting.
We combine the qualities of weight factorization and elastic weight consolidation in order to counter catastrophic forgetting.
We achieve 26 languages without catastrophic forgetting and a reasonable performance compared to training all languages from scratch.
arXiv Detail & Related papers (2022-11-21T18:24:34Z) - Parameter-Efficient Finetuning for Robust Continual Multilingual
Learning [15.823345795987237]
We introduce and study the problem of Continual Multilingual Learning (CML)
A previously trained multilingual model is periodically updated using new data arriving in stages.
If the new data is present only in a subset of languages, we find that the resulting model shows improved performance only on the languages included in the latest update while its performance on all the remaining languages degrade significantly.
We propose LAFT-URIEL, a parameter-efficient finetuning strategy which aims to increase the number of languages on which the model improves after an update, while reducing the magnitude of loss in performance for the remaining languages.
arXiv Detail & Related papers (2022-09-14T16:45:13Z) - Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm.
We provide insights into what makes multilingual sequential learning particularly challenging.
The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z) - Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system.
It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z) - Multilingual and cross-lingual document classification: A meta-learning
approach [24.66829920826166]
We propose a meta-learning approach to document classification in limited-resource setting.
We show effectiveness in two settings: few-shot, cross-lingual adaptation to previously unseen languages and multilingual joint training.
arXiv Detail & Related papers (2021-01-27T10:22:56Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.