Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
Pre-Training for Adaptation to Unseen Languages
- URL: http://arxiv.org/abs/2305.12606v2
- Date: Wed, 31 May 2023 01:27:41 GMT
- Title: Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
Pre-Training for Adaptation to Unseen Languages
- Authors: Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris,
Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass
- Abstract summary: Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.
We aim to understand which model adapts better to languages unseen during pre-training.
We fine-tune both models on 13 unseen languages and 18 seen languages.
- Score: 40.41642013737395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent models such as XLS-R and Whisper have made multilingual speech
technologies more accessible by pre-training on audio from around 100 spoken
languages each. However, there are thousands of spoken languages worldwide, and
adapting to new languages is an important problem. In this work, we aim to
understand which model adapts better to languages unseen during pre-training.
We fine-tune both models on 13 unseen languages and 18 seen languages. Our
results show that the number of hours seen per language and language family
during pre-training is predictive of how the models compare, despite the
significant differences in the pre-training methods.
Related papers
- Improved Self-Supervised Multilingual Speech Representation Learning
Combined with Auxiliary Language Information [21.250763472985824]
Self-supervised multilingual speech representation learning has shown success in improving the performance of multilingual automatic speech recognition.
However, similar to the supervised learning, multilingual pre-training may also suffer from language interference.
We introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information.
arXiv Detail & Related papers (2022-12-07T06:18:59Z) - Towards continually learning new languages [66.36852845415916]
batch-learning of languages can be economically beneficial, but the main challenge is catastrophic forgetting.
We combine the qualities of weight factorization and elastic weight consolidation in order to counter catastrophic forgetting.
We achieve 26 languages without catastrophic forgetting and a reasonable performance compared to training all languages from scratch.
arXiv Detail & Related papers (2022-11-21T18:24:34Z) - M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for
Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.
For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Phylogeny-Inspired Adaptation of Multilingual Models to New Languages [43.62238334380897]
We show how we can use language phylogenetic information to improve cross-lingual transfer leveraging closely related languages.
We perform adapter-based training on languages from diverse language families (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntactic and semantic tasks.
arXiv Detail & Related papers (2022-05-19T15:49:19Z) - Match the Script, Adapt if Multilingual: Analyzing the Effect of
Multilingual Pretraining on Cross-lingual Transferability [26.553524219316188]
Pretrained multilingual models enable zero-shot learning even for unseen languages.
It is unclear how the number of pretraining languages influences a model's zero-shot learning for languages unseen during pretraining.
arXiv Detail & Related papers (2022-03-21T06:52:38Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - CLSRIL-23: Cross Lingual Speech Representations for Indic Languages [0.0]
CLSRIL-23 is a self supervised learning based model which learns cross lingual speech representations from raw audio across 23 Indic languages.
It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations.
We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining.
arXiv Detail & Related papers (2021-07-15T15:42:43Z) - Improved Language Identification Through Cross-Lingual Self-Supervised
Learning [37.32193095549614]
We extend previous self-supervised work on language identification by experimenting with pre-trained models.
Results on a 25 languages setup show that with only 10 minutes of labeled data per language, a cross-lingually pre-trained model can achieve over 93% accuracy.
arXiv Detail & Related papers (2021-07-08T19:37:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.