Large vocabulary speech recognition for languages of Africa:
multilingual modeling and self-supervised learning
- URL: http://arxiv.org/abs/2208.03067v1
- Date: Fri, 5 Aug 2022 09:54:19 GMT
- Title: Large vocabulary speech recognition for languages of Africa:
multilingual modeling and self-supervised learning
- Authors: Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van
Esch, Bo Li, Khe Chai Sim
- Abstract summary: Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems.
We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages.
- Score: 11.408563104045285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Almost none of the 2,000+ languages spoken in Africa have widely available
automatic speech recognition systems, and the required data is also only
available for a few languages. We have experimented with two techniques which
may provide pathways to large vocabulary speech recognition for African
languages: multilingual modeling and self-supervised learning. We gathered
available open source data and collected data for 15 languages, and trained
experimental models using these techniques. Our results show that pooling the
small amounts of data available in multilingual end-to-end models, and
pre-training on unsupervised data can help improve speech recognition quality
for many African languages.
Related papers
- Lip Reading for Low-resource Languages by Learning and Combining General
Speech Knowledge and Language-specific Knowledge [57.38948190611797]
This paper proposes a novel lip reading framework, especially for low-resource languages.
Since low-resource languages do not have enough video-text paired data to train the model, it is regarded as challenging to develop lip reading models for low-resource languages.
arXiv Detail & Related papers (2023-08-18T05:19:03Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages [20.25236081418051]
Zambezi Voice is an open-source multilingual speech resource for Zambian languages.
To our knowledge, this is the first multilingual speech dataset created for Zambian languages.
arXiv Detail & Related papers (2023-06-07T13:36:37Z) - Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task.
Main ingredients are a new dataset based on readings of publicly available religious texts.
We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - Adaptive multilingual speech recognition with pretrained models [24.01587237432548]
We investigate the effectiveness of two pretrained models for two modalities: wav2vec 2.0 for audio and MBART50 for text.
Overall, we noticed an 44% improvement over purely supervised learning.
arXiv Detail & Related papers (2022-05-24T18:29:07Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Improved Language Identification Through Cross-Lingual Self-Supervised
Learning [37.32193095549614]
We extend previous self-supervised work on language identification by experimenting with pre-trained models.
Results on a 25 languages setup show that with only 10 minutes of labeled data per language, a cross-lingually pre-trained model can achieve over 93% accuracy.
arXiv Detail & Related papers (2021-07-08T19:37:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.