Master-ASR: Achieving Multilingual Scalability and Low-Resource
Adaptation in ASR with Modular Learning
- URL: http://arxiv.org/abs/2306.15686v1
- Date: Fri, 23 Jun 2023 16:23:00 GMT
- Title: Master-ASR: Achieving Multilingual Scalability and Low-Resource
Adaptation in ASR with Modular Learning
- Authors: Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Yonggan Fu, Yingyan Lin
- Abstract summary: METHODNS simultaneously achieves strong multilingual scalability and low-resource adaptation ability.
Our framework achieves a 0.13$sim$2.41 lower character error rate (CER) with 30% smaller inference overhead over state-of-the-art (SOTA) methods.
- Score: 28.592569051244375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the impressive performance recently achieved by automatic speech
recognition (ASR), we observe two primary challenges that hinder its broader
applications: (1) The difficulty of introducing scalability into the model to
support more languages with limited training, inference, and storage overhead;
(2) The low-resource adaptation ability that enables effective low-resource
adaptation while avoiding over-fitting and catastrophic forgetting issues.
Inspired by recent findings, we hypothesize that we can address the above
challenges with modules widely shared across languages. To this end, we propose
an ASR framework, dubbed \METHODNS, that, \textit{for the first time},
simultaneously achieves strong multilingual scalability and low-resource
adaptation ability thanks to its modularize-then-assemble strategy.
Specifically, \METHOD learns a small set of generalizable sub-modules and
adaptively assembles them for different languages to reduce the multilingual
overhead and enable effective knowledge transfer for low-resource adaptation.
Extensive experiments and visualizations demonstrate that \METHOD can
effectively discover language similarity and improve multilingual and
low-resource ASR performance over state-of-the-art (SOTA) methods, e.g., under
multilingual-ASR, our framework achieves a 0.13$\sim$2.41 lower character error
rate (CER) with 30\% smaller inference overhead over SOTA solutions on
multilingual ASR and a comparable CER, with nearly 50 times fewer trainable
parameters over SOTA solutions on low-resource tuning, respectively.
Related papers
- Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR [25.566285376879094]
Multimodal models are able to leverage unlabeled text via text-only adaptation with further parameter-efficient ASR fine-tuning.
We show cross-lingual transfer from a high-resource language, achieving up to a relative 17% WER reduction over a baseline in a zero-shot setting.
arXiv Detail & Related papers (2024-10-17T11:19:44Z) - Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition [2.7247388777405597]
We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets.
We fine-tune the Whisper multilingual ASR model on five high-resource languages and one low-resource language.
arXiv Detail & Related papers (2024-09-25T14:09:09Z) - Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages [51.12146889808824]
Meta-Whisper is a novel approach to improve automatic speech recognition for low-resource languages.
It enhances Whisper's ability to recognize speech in unfamiliar languages without extensive fine-tuning.
arXiv Detail & Related papers (2024-09-16T16:04:16Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process.
Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z) - Efficient Compression of Multitask Multilingual Speech Models [0.0]
DistilWhisper is able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
arXiv Detail & Related papers (2024-05-02T03:11:59Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech
Models via Language-Specific Experts [14.999359332108767]
We propose DistilWhisper to bridge the performance gap in ASR for under-represented languages.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
Results demonstrate that our approach is more effective than standard fine-tuning or LoRA adapters.
arXiv Detail & Related papers (2023-11-02T08:37:30Z) - A Multi-level Supervised Contrastive Learning Framework for Low-Resource
Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding.
Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z) - Adversarial Meta Sampling for Multilingual Low-Resource Speech
Recognition [159.9312272042253]
We develop a novel adversarial meta sampling (AMS) approach to improve multilingual meta-learning ASR (MML-ASR)
AMS adaptively determines the task sampling probability for each source language.
Experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR.
arXiv Detail & Related papers (2020-12-22T09:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.