OkwuGb\'e: End-to-End Speech Recognition for Fon and Igbo
- URL: http://arxiv.org/abs/2103.07762v2
- Date: Tue, 16 Mar 2021 04:35:06 GMT
- Title: OkwuGb\'e: End-to-End Speech Recognition for Fon and Igbo
- Authors: Bonaventure F. P. Dossou and Chris C. Emezue
- Abstract summary: We present a state-of-art ASR model for Fon, as well as benchmark ASR model results for Igbo.
We conduct a comprehensive linguistic analysis of each language and describe the creation of end-to-end, deep neural network-based speech recognition models for both languages.
- Score: 0.015863809575305417
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language is inherent and compulsory for human communication. Whether
expressed in a written or spoken way, it ensures understanding between people
of the same and different regions. With the growing awareness and effort to
include more low-resourced languages in NLP research, African languages have
recently been a major subject of research in machine translation, and other
text-based areas of NLP. However, there is still very little comparable
research in speech recognition for African languages. Interestingly, some of
the unique properties of African languages affecting NLP, like their
diacritical and tonal complexities, have a major root in their speech,
suggesting that careful speech interpretation could provide more intuition on
how to deal with the linguistic complexities of African languages for
text-based NLP. OkwuGb\'e is a step towards building speech recognition systems
for African low-resourced languages. Using Fon and Igbo as our case study, we
conduct a comprehensive linguistic analysis of each language and describe the
creation of end-to-end, deep neural network-based speech recognition models for
both languages. We present a state-of-art ASR model for Fon, as well as
benchmark ASR model results for Igbo. Our linguistic analyses (for Fon and
Igbo) provide valuable insights and guidance into the creation of speech
recognition models for other African low-resourced languages, as well as guide
future NLP research for Fon and Igbo. The Fon and Igbo models source code have
been made publicly available.
Related papers
- Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers.
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects.
We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z) - The Ghanaian NLP Landscape: A First Look [9.17372840572907]
Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk.
This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages.
arXiv Detail & Related papers (2024-05-10T21:39:09Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Lip Reading for Low-resource Languages by Learning and Combining General
Speech Knowledge and Language-specific Knowledge [57.38948190611797]
This paper proposes a novel lip reading framework, especially for low-resource languages.
Since low-resource languages do not have enough video-text paired data to train the model, it is regarded as challenging to develop lip reading models for low-resource languages.
arXiv Detail & Related papers (2023-08-18T05:19:03Z) - Phonemic Representation and Transcription for Speech to Text
Applications for Under-resourced Indigenous African Languages: The Case of
Kiswahili [0.0]
It has emerged that several African indigenous languages, including Kiswahili, are technologically under-resourced.
This paper explores the transcription process and the development of a Kiswahili speech corpus.
It provides an updated Kiswahili phoneme dictionary for the ASR model that was created using the CMU Sphinx speech recognition toolbox.
arXiv Detail & Related papers (2022-10-29T09:04:09Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Including Signed Languages in Natural Language Processing [48.62744923724317]
Signed languages are the primary means of communication for many deaf and hard of hearing individuals.
This position paper calls on the NLP community to include signed languages as a research area with high social and scientific impact.
arXiv Detail & Related papers (2021-05-11T17:37:55Z) - AfriVEC: Word Embedding Models for African Languages. Case Study of Fon
and Nobiin [0.015863809575305417]
We build Word2Vec and Poincar'e word embedding models for Fon and Nobiin.
Our main contribution is to arouse more interest in creating word embedding models proper to African Languages.
arXiv Detail & Related papers (2021-03-08T22:58:20Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.