MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting
- URL: http://arxiv.org/abs/2601.20300v1
- Date: Wed, 28 Jan 2026 06:48:52 GMT
- Title: MiLorE-SSL: Scaling Multilingual Capabilities in Self-Supervised Models without Forgetting
- Authors: Jing Xu, Minglin Wu, Xueyuan Chen, Xixin Wu, Helen Meng,
- Abstract summary: MiLorE-SSL is a lightweight framework that combines LoRA modules with a soft mixture-of-experts mechanism for efficient continual multilingual training.<n>LoRA provides efficient low-rank adaptation, while soft MoE promotes flexible expert sharing across languages, reducing cross-lingual interference.<n>Experiments on ML-SUPERB demonstrate that MiLorE-SSL achieves strong performance in new languages and improves the ability in existing ones with only 2.14% trainable parameters.
- Score: 69.6938830307759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) has greatly advanced speech representation learning, but multilingual SSL models remain constrained to languages encountered during pretraining. Retraining from scratch to incorporate new languages is computationally expensive, while sequential training without migitation strategies often leads to catastrophic forgetting. To address this, we propose MiLorE-SSL, a lightweight framework that combines LoRA modules with a soft mixture-of-experts (MoE) mechanism for efficient continual multilingual training. LoRA provides efficient low-rank adaptation, while soft MoE promotes flexible expert sharing across languages, reducing cross-lingual interference. To further mitigate forgetting, we introduce limited replay data from existing languages, avoiding reliance on large historical corpora. Experiments on ML-SUPERB demonstrate that MiLorE-SSL achieves strong performance in new languages and improves the ability in existing ones with only 2.14% trainable parameters.
Related papers
- Lamer-SSL: Layer-aware Mixture of LoRA Experts for Continual Multilingual Expansion of Self-supervised Models without Forgetting [69.6938830307759]
Lamer-SSL is a parameter-efficient framework that integrates a Layer-Aware MixturE of LoRA Experts (Lamer) module with a replay strategy.<n> Experiments on automatic speech recognition (ASR) and language identification (LID) demonstrate that Lamer-SSL extends self-supervised models to new languages effectively.
arXiv Detail & Related papers (2026-02-13T09:22:22Z) - MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing [78.62611800987817]
Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data.
We propose a method called MoE-LPR (Mixture-of-Experts with Language Priors) to enhance the multilingual capability.
arXiv Detail & Related papers (2024-08-21T07:43:49Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models [69.59613095232598]
We propose adaptation methods which integrate LoRA to existed SSL models to extend new language.<n>We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages.
arXiv Detail & Related papers (2024-06-20T08:13:30Z) - Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language [7.289015788793582]
This work focuses on increasing technological participation for the S'ami language.
We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages.
We have compiled the available S'ami language resources from the web to create a clean dataset for training language models.
arXiv Detail & Related papers (2024-05-09T13:54:22Z) - Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models [110.10545153845051]
Cross-lingual Expert Language Models (X-ELM) is a process that specializes X-ELMs to different languages while remaining effective as a multilingual ensemble.
X-ELM provides additional benefits over performance improvements: new experts can be iteratively added, adapting X-ELM to new languages without catastrophic forgetting.
arXiv Detail & Related papers (2024-01-19T01:07:50Z) - Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages.
Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z) - MergeDistill: Merging Pre-trained Language Models using Distillation [5.396915402673246]
We propose MergeDistill, a framework to merge pre-trained LMs in a way that can best leverage their assets with minimal dependencies.
We demonstrate the applicability of our framework in a practical setting by leveraging pre-existing teacher LMs and training student LMs that perform competitively with or even outperform teacher LMs trained on several orders of magnitude more data and with a fixed model capacity.
arXiv Detail & Related papers (2021-06-05T08:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.