Learning ASR pathways: A sparse multilingual ASR model
- URL: http://arxiv.org/abs/2209.05735v4
- Date: Thu, 28 Sep 2023 20:24:33 GMT
- Title: Learning ASR pathways: A sparse multilingual ASR model
- Authors: Mu Yang, Andros Tjandra, Chunxi Liu, David Zhang, Duc Le, Ozlem
Kalinli
- Abstract summary: We present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks ("pathways")
With the overlapping sub-networks, the shared parameters can also enable knowledge transfer for lower-resource languages via joint multilingual training.
Our proposed ASR pathways outperform both dense models and a language-agnostically pruned model, and provide better performance on low-resource languages.
- Score: 31.147484652643282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network pruning compresses automatic speech recognition (ASR) models
effectively. However, in multilingual ASR, language-agnostic pruning may lead
to severe performance drops on some languages because language-agnostic pruning
masks may not fit all languages and discard important language-specific
parameters. In this work, we present ASR pathways, a sparse multilingual ASR
model that activates language-specific sub-networks ("pathways"), such that the
parameters for each language are learned explicitly. With the overlapping
sub-networks, the shared parameters can also enable knowledge transfer for
lower-resource languages via joint multilingual training. We propose a novel
algorithm to learn ASR pathways, and evaluate the proposed method on 4
languages with a streaming RNN-T model. Our proposed ASR pathways outperform
both dense models and a language-agnostically pruned model, and provide better
performance on low-resource languages compared to the monolingual sparse
models.
Related papers
- Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages [51.12146889808824]
Meta-Whisper is a novel approach to improve automatic speech recognition for low-resource languages.
It enhances Whisper's ability to recognize speech in unfamiliar languages without extensive fine-tuning.
arXiv Detail & Related papers (2024-09-16T16:04:16Z) - Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient
Pruning of A Multilingual ASR Model [27.916389316449443]
We propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently.
Our approach dynamically adapts the sub-network, avoiding premature decisions about a fixed sub-network structure.
We show that our approach outperforms existing pruning methods when targeting sparse monolingual models.
arXiv Detail & Related papers (2023-09-22T17:30:28Z) - Learning Cross-lingual Mappings for Data Augmentation to Improve
Low-Resource Speech Recognition [31.575930914290762]
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages.
We extend the concept of learnable cross-lingual mappings for end-to-end speech recognition.
The results show that any source language ASR model can be used for a low-resource target language recognition.
arXiv Detail & Related papers (2023-06-14T15:24:31Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and
Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure.
We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z) - Adaptive Activation Network For Low Resource Multilingual Speech
Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model.
We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning.
Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z) - A Survey of Multilingual Models for Automatic Speech Recognition [6.657361001202456]
Cross-lingual transfer is an attractive solution to the problem of multilingual Automatic Speech Recognition.
Recent advances in Self Supervised Learning are opening up avenues for unlabeled speech data to be used in multilingual ASR models.
We present best practices for building multilingual models from research across diverse languages and techniques.
arXiv Detail & Related papers (2022-02-25T09:31:40Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - How Phonotactics Affect Multilingual and Zero-shot ASR Performance [74.70048598292583]
A Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training.
We replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM.
We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer.
arXiv Detail & Related papers (2020-10-22T23:07:24Z) - Streaming End-to-End Bilingual ASR Systems with Joint Language
Identification [19.09014345299161]
We introduce streaming, end-to-end, bilingual systems that perform both ASR and language identification.
The proposed method is applied to two language pairs: English-Spanish as spoken in the United States, and English-Hindi as spoken in India.
arXiv Detail & Related papers (2020-07-08T05:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.