Towards continually learning new languages
- URL: http://arxiv.org/abs/2211.11703v4
- Date: Wed, 17 Jul 2024 21:56:04 GMT
- Title: Towards continually learning new languages
- Authors: Ngoc-Quan Pham, Jan Niehues, Alexander Waibel,
- Abstract summary: batch-learning of languages can be economically beneficial, but the main challenge is catastrophic forgetting.
We combine the qualities of weight factorization and elastic weight consolidation in order to counter catastrophic forgetting.
We achieve 26 languages without catastrophic forgetting and a reasonable performance compared to training all languages from scratch.
- Score: 66.36852845415916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual speech recognition with neural networks is often implemented with batch-learning, when all of the languages are available before training. An ability to add new languages after the prior training sessions can be economically beneficial, but the main challenge is catastrophic forgetting. In this work, we combine the qualities of weight factorization and elastic weight consolidation in order to counter catastrophic forgetting and facilitate learning new languages quickly. Such combination allowed us to eliminate catastrophic forgetting while still achieving performance for the new languages comparable with having all languages at once, in experiments of learning from an initial 10 languages to achieve 26 languages without catastrophic forgetting and a reasonable performance compared to training all languages from scratch.
Related papers
- No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
Pre-Training for Adaptation to Unseen Languages [40.41642013737395]
Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.
We aim to understand which model adapts better to languages unseen during pre-training.
We fine-tune both models on 13 unseen languages and 18 seen languages.
arXiv Detail & Related papers (2023-05-21T23:53:12Z) - Parameter-Efficient Finetuning for Robust Continual Multilingual
Learning [15.823345795987237]
We introduce and study the problem of Continual Multilingual Learning (CML)
A previously trained multilingual model is periodically updated using new data arriving in stages.
If the new data is present only in a subset of languages, we find that the resulting model shows improved performance only on the languages included in the latest update while its performance on all the remaining languages degrade significantly.
We propose LAFT-URIEL, a parameter-efficient finetuning strategy which aims to increase the number of languages on which the model improves after an update, while reducing the magnitude of loss in performance for the remaining languages.
arXiv Detail & Related papers (2022-09-14T16:45:13Z) - Language Chameleon: Transformation analysis between languages using
Cross-lingual Post-training based on Pre-trained language models [4.731313022026271]
In this study, we focus on a single low-resource language and perform extensive evaluation and probing experiments using cross-lingual post-training (XPT)
Results show that XPT not only outperforms or performs on par with monolingual models trained with orders of magnitudes more data but also is highly efficient in the transfer process.
arXiv Detail & Related papers (2022-09-14T05:20:52Z) - Phylogeny-Inspired Adaptation of Multilingual Models to New Languages [43.62238334380897]
We show how we can use language phylogenetic information to improve cross-lingual transfer leveraging closely related languages.
We perform adapter-based training on languages from diverse language families (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntactic and semantic tasks.
arXiv Detail & Related papers (2022-05-19T15:49:19Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.