Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
- URL: http://arxiv.org/abs/2406.14092v1
- Date: Thu, 20 Jun 2024 08:13:30 GMT
- Title: Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
- Authors: Jing Xu, Minglin Wu, Xixin Wu, Helen Meng,
- Abstract summary: We propose adaptation methods which integrate LoRA to existed SSL models to extend new language.
We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages.
- Score: 60.09618700199927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose adaptation methods which integrate LoRA to existed SSL models to extend new language. We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages. Applied to mHuBERT, we investigate their effectiveness on speech re-synthesis task. Experiments show that our adaptation methods enable mHuBERT to be applied to a new language (Mandarin) with MOS value increased about 1.6 and the relative value of WER reduced up to 61.72%. Also, our preservation strategies ensure that the performance on both existed and new languages remains intact.
Related papers
- TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning [41.28933724210434]
Language model continual learning (CL) has recently attracted significant interest for its ability to adapt large language models (LLMs) to dynamic real-world scenarios without retraining.
Existing approaches commonly utilize multiple parameter-efficient fine-tuning (PEFT) blocks to acquire task-specific knowledge, yet these methods are inefficient and fail to leverage potential knowledge transfer across tasks.
We introduce a novel CL framework for language models, named Task Skill Localization and Consolidation (TaSL), which boosts knowledge transfer without depending on memory replay.
arXiv Detail & Related papers (2024-08-09T17:44:45Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Efficient Compression of Multitask Multilingual Speech Models [0.0]
DistilWhisper is able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
arXiv Detail & Related papers (2024-05-02T03:11:59Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Efficiently Adapting Pretrained Language Models To New Languages [9.33333013114014]
Recent large language models (LLM) exhibit sub-optimal performance on low-resource languages.
We study how to efficiently adapt any existing pretrained LLM to a new language without running into these issues.
arXiv Detail & Related papers (2023-11-09T20:59:08Z) - Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages.
Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting [50.24676567971536]
The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages.
We apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages.
We conclude that with sufficient training data language adaptation can generalize well to diverse languages.
arXiv Detail & Related papers (2022-12-19T15:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.