Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0
- URL: http://arxiv.org/abs/2110.03560v1
- Date: Thu, 7 Oct 2021 15:29:22 GMT
- Title: Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0
- Authors: Sameer Khurana, Antoine Laurent, James Glass
- Abstract summary: We show that a monolingual wav2vec-2.0 is a good few-shot ASR learner in several languages.
A key finding of this work is that the adapted monolingual wav2vec-2.0 achieves similar performance as the topline multilingual XLSR model.
- Score: 7.378368959253632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a simple and effective cross-lingual transfer learning method to
adapt monolingual wav2vec-2.0 models for Automatic Speech Recognition (ASR) in
resource-scarce languages. We show that a monolingual wav2vec-2.0 is a good
few-shot ASR learner in several languages. We improve its performance further
via several iterations of Dropout Uncertainty-Driven Self-Training (DUST) by
using a moderate-sized unlabeled speech dataset in the target language. A key
finding of this work is that the adapted monolingual wav2vec-2.0 achieves
similar performance as the topline multilingual XLSR model, which is trained on
fifty-three languages, on the target language ASR task.
Related papers
- Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech
Models via Language-Specific Experts [14.999359332108767]
We propose DistilWhisper to bridge the performance gap in ASR for under-represented languages.
Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2.
Results demonstrate that our approach is more effective than standard fine-tuning or LoRA adapters.
arXiv Detail & Related papers (2023-11-02T08:37:30Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task.
Main ingredients are a new dataset based on readings of publicly available religious texts.
We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z) - Hindi as a Second Language: Improving Visually Grounded Speech with
Semantically Similar Samples [89.16814518860357]
The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective.
Our key contribution in this work is to leverage the power of a high-resource language in a bilingual visually grounded speech model to improve the performance of a low-resource language.
arXiv Detail & Related papers (2023-03-30T16:34:10Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Distilling a Pretrained Language Model to a Multilingual ASR Model [3.4012007729454816]
We distill the rich knowledge embedded inside a well-trained teacher text model to the student speech model.
We show the superiority of our method on 20 low-resource languages of the CommonVoice dataset with less than 100 hours of speech data.
arXiv Detail & Related papers (2022-06-25T12:36:11Z) - Adaptive Activation Network For Low Resource Multilingual Speech
Recognition [30.460501537763736]
We introduce an adaptive activation network to the upper layers of ASR model.
We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, and (2) multilingual learning.
Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods.
arXiv Detail & Related papers (2022-05-28T04:02:59Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters [31.705705891482594]
We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages.
We compare three variants of multilingual training from a single joint model without knowing the input language, to using this information, to multiple heads.
We show that multilingual training of ASR models on several languages can improve recognition performance, in particular, on low resource languages.
arXiv Detail & Related papers (2020-07-06T18:43:38Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.