Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient
Pruning of A Multilingual ASR Model
- URL: http://arxiv.org/abs/2309.13018v2
- Date: Thu, 11 Jan 2024 19:15:32 GMT
- Title: Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient
Pruning of A Multilingual ASR Model
- Authors: Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda
Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli
- Abstract summary: We propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently.
Our approach dynamically adapts the sub-network, avoiding premature decisions about a fixed sub-network structure.
We show that our approach outperforms existing pruning methods when targeting sparse monolingual models.
- Score: 27.916389316449443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural network pruning offers an effective method for compressing a
multilingual automatic speech recognition (ASR) model with minimal performance
loss. However, it entails several rounds of pruning and re-training needed to
be run for each language. In this work, we propose the use of an adaptive
masking approach in two scenarios for pruning a multilingual ASR model
efficiently, each resulting in sparse monolingual models or a sparse
multilingual model (named as Dynamic ASR Pathways). Our approach dynamically
adapts the sub-network, avoiding premature decisions about a fixed sub-network
structure. We show that our approach outperforms existing pruning methods when
targeting sparse monolingual models. Further, we illustrate that Dynamic ASR
Pathways jointly discovers and trains better sub-networks (pathways) of a
single multilingual model by adapting from different sub-network
initializations, thereby reducing the need for language-specific pruning.
Related papers
- Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR [25.566285376879094]
Multimodal models are able to leverage unlabeled text via text-only adaptation with further parameter-efficient ASR fine-tuning.
We show cross-lingual transfer from a high-resource language, achieving up to a relative 17% WER reduction over a baseline in a zero-shot setting.
arXiv Detail & Related papers (2024-10-17T11:19:44Z) - FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers [55.2480439325792]
We propose FUSE, an approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers.
We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.
arXiv Detail & Related papers (2024-08-09T02:16:37Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Efficient Compression of Multitask Multilingual Speech Models [0.0]
DistilWhisper is able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
arXiv Detail & Related papers (2024-05-02T03:11:59Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech
Models via Language-Specific Experts [14.999359332108767]
We propose DistilWhisper to bridge the performance gap in ASR for under-represented languages.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
Results demonstrate that our approach is more effective than standard fine-tuning or LoRA adapters.
arXiv Detail & Related papers (2023-11-02T08:37:30Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - Adapting Multi-Lingual ASR Models for Handling Multiple Talkers [63.151811561972515]
State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages.
We propose an approach to adapt USMs for multi-talker ASR.
We first develop an enhanced version of serialized output training to jointly perform multi-talker ASR and utterance timestamp prediction.
arXiv Detail & Related papers (2023-05-30T05:05:52Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Learning ASR pathways: A sparse multilingual ASR model [31.147484652643282]
We present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks ("pathways")
With the overlapping sub-networks, the shared parameters can also enable knowledge transfer for lower-resource languages via joint multilingual training.
Our proposed ASR pathways outperform both dense models and a language-agnostically pruned model, and provide better performance on low-resource languages.
arXiv Detail & Related papers (2022-09-13T05:14:08Z) - Adaptive Activation Network For Low Resource Multilingual Speech
Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model.
We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning.
Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.