Pseudo-Labeling for Massively Multilingual Speech Recognition
- URL: http://arxiv.org/abs/2111.00161v1
- Date: Sat, 30 Oct 2021 03:30:17 GMT
- Title: Pseudo-Labeling for Massively Multilingual Speech Recognition
- Authors: Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
- Abstract summary: Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems.
We propose a simple pseudo-labeling recipe that works well even with low-resource languages.
- Score: 34.295967235026936
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Semi-supervised learning through pseudo-labeling has become a staple of
state-of-the-art monolingual speech recognition systems. In this work, we
extend pseudo-labeling to massively multilingual speech recognition with 60
languages. We propose a simple pseudo-labeling recipe that works well even with
low-resource languages: train a supervised multilingual model, fine-tune it
with semi-supervised learning on a target language, generate pseudo-labels for
that language, and train a final model using pseudo-labels for all languages,
either from scratch or by fine-tuning. Experiments on the labeled Common Voice
and unlabeled VoxPopuli datasets show that our recipe can yield a model with
better performance for many languages that also transfers well to LibriSpeech.
Related papers
- Visual Speech Recognition for Languages with Limited Labeled Data using
Automatic Labels from Whisper [96.43501666278316]
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages.
We employ a Whisper model which can conduct both language identification and audio-based speech recognition.
By comparing the performances of VSR models trained on automatic labels and the human-annotated labels, we show that we can achieve similar VSR performance to that of human-annotated labels.
arXiv Detail & Related papers (2023-09-15T16:53:01Z) - Language-Universal Phonetic Representation in Multilingual Speech
Pretraining for Low-Resource Speech Recognition [28.21805271848413]
We leverage an International Phonetic Alphabet (IPA) multilingual model to create frame-level pseudo labels for unlabeled speech.
We use these pseudo labels to guide hidden-unit BERT (HuBERT) based speech pretraining in a phonetically-informed manner.
Our approach outperforms most of the state of the arts, with much less pretraining data in terms of hours and language diversity.
arXiv Detail & Related papers (2023-05-19T10:15:11Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - Improved acoustic word embeddings for zero-resource languages using
multilingual transfer [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages and apply it to unseen zero-resource languages.
We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.
All of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision.
arXiv Detail & Related papers (2020-06-02T12:28:34Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z) - Multilingual acoustic word embedding models for processing zero-resource
languages [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages.
We then apply it to unseen zero-resource languages.
arXiv Detail & Related papers (2020-02-06T05:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.