Using Radio Archives for Low-Resource Speech Recognition: Towards an
Intelligent Virtual Assistant for Illiterate Users
- URL: http://arxiv.org/abs/2104.13083v1
- Date: Tue, 27 Apr 2021 10:09:34 GMT
- Title: Using Radio Archives for Low-Resource Speech Recognition: Towards an
Intelligent Virtual Assistant for Illiterate Users
- Authors: Moussa Doumbouya, Lisa Einstein, Chris Piech
- Abstract summary: In many countries, illiterate people tend to speak only low-resource languages.
We investigate the effectiveness of unsupervised speech representation learning on noisy radio broadcasting archives.
Our contributions offer a path forward for ethical AI research to serve the needs of those most disadvantaged by the digital divide.
- Score: 3.3946853660795884
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: For many of the 700 million illiterate people around the world, speech
recognition technology could provide a bridge to valuable information and
services. Yet, those most in need of this technology are often the most
underserved by it. In many countries, illiterate people tend to speak only
low-resource languages, for which the datasets necessary for speech technology
development are scarce. In this paper, we investigate the effectiveness of
unsupervised speech representation learning on noisy radio broadcasting
archives, which are abundant even in low-resource languages. We make three core
contributions. First, we release two datasets to the research community. The
first, West African Radio Corpus, contains 142 hours of audio in more than 10
languages with a labeled validation subset. The second, West African Virtual
Assistant Speech Recognition Corpus, consists of 10K labeled audio clips in
four languages. Next, we share West African wav2vec, a speech encoder trained
on the noisy radio corpus, and compare it with the baseline Facebook speech
encoder trained on six times more data of higher quality. We show that West
African wav2vec performs similarly to the baseline on a multilingual speech
recognition task, and significantly outperforms the baseline on a West African
language identification task. Finally, we share the first-ever speech
recognition models for Maninka, Pular and Susu, languages spoken by a combined
10 million people in over seven countries, including six where the majority of
the adult population is illiterate. Our contributions offer a path forward for
ethical AI research to serve the needs of those most disadvantaged by the
digital divide.
Related papers
- Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers.
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects.
We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z) - Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech [0.5330251011543498]
We used our dataset of more than 45 hours of audio data from 633 individual male and female speakers.
We recorded the highest accuracy of 85.44%.
arXiv Detail & Related papers (2024-04-18T10:17:20Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - Scaling Speech Technology to 1,000+ Languages [66.31120979098483]
The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task.
Main ingredients are a new dataset based on readings of publicly available religious texts.
We built pre-trained wav2vec 2.0 models covering 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for the same number of languages, and a language identification model for 4,017 languages.
arXiv Detail & Related papers (2023-05-22T22:09:41Z) - ASR2K: Speech Recognition for Around 2000 Languages without Audio [100.41158814934802]
We present a speech recognition pipeline that does not require any audio for the target language.
Our pipeline consists of three components: acoustic, pronunciation, and language models.
We build speech recognition for 1909 languages by combining it with Crubadan: a large endangered languages n-gram database.
arXiv Detail & Related papers (2022-09-06T22:48:29Z) - Building African Voices [125.92214914982753]
This paper focuses on speech synthesis for low-resourced African languages.
We create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources.
We release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
arXiv Detail & Related papers (2022-07-01T23:28:16Z) - Brazilian Portuguese Speech Recognition Using Wav2vec 2.0 [0.26097841018267615]
This work presents the development of a public Automatic Speech Recognition system using only open available audio data.
The final model presents a Word Error Rate of 11.95% (Common Voice dataset)
This corresponds to 13% less than the best open Automatic Speech Recognition model for Brazilian Portuguese available according to our best knowledge.
arXiv Detail & Related papers (2021-07-23T18:54:39Z) - Unsupervised Speech Recognition [55.864459085947345]
wav2vec-U, short for wav2vec Unsupervised, is a method to train speech recognition models without any labeled data.
We leverage self-supervised speech representations to segment unlabeled audio and learn a mapping from these representations to phonemes via adversarial training.
On the larger English Librispeech benchmark, wav2vec-U achieves a word error rate of 5.9 on test-other, rivaling some of the best published systems trained on 960 hours of labeled data from only two years ago.
arXiv Detail & Related papers (2021-05-24T04:10:47Z) - Applying Wav2vec2.0 to Speech Recognition in Various Low-resource
Languages [16.001329145018687]
In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition on the Librispeech corpus.
However, wav2vec2.0 has not been examined on real spoken scenarios and languages other than English.
We apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages.
arXiv Detail & Related papers (2020-12-22T15:59:44Z) - Towards End-to-End Training of Automatic Speech Recognition for Nigerian
Pidgin [0.0]
Nigerian pidgin is one of the most popular languages in West Africa.
We present the first parallel (speech-to-text) data on Nigerian pidgin.
We also trained the first end-to-end speech recognition system on this language.
arXiv Detail & Related papers (2020-10-21T16:32:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.