Phonemic Representation and Transcription for Speech to Text
Applications for Under-resourced Indigenous African Languages: The Case of
Kiswahili
- URL: http://arxiv.org/abs/2210.16537v1
- Date: Sat, 29 Oct 2022 09:04:09 GMT
- Title: Phonemic Representation and Transcription for Speech to Text
Applications for Under-resourced Indigenous African Languages: The Case of
Kiswahili
- Authors: Ebbie Awino, Lilian Wanzare, Lawrence Muchemi, Barack Wanjawa, Edward
Ombui, Florence Indede, Owen McOnyango, Benard Okal
- Abstract summary: It has emerged that several African indigenous languages, including Kiswahili, are technologically under-resourced.
This paper explores the transcription process and the development of a Kiswahili speech corpus.
It provides an updated Kiswahili phoneme dictionary for the ASR model that was created using the CMU Sphinx speech recognition toolbox.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building automatic speech recognition (ASR) systems is a challenging task,
especially for under-resourced languages that need to construct corpora nearly
from scratch and lack sufficient training data. It has emerged that several
African indigenous languages, including Kiswahili, are technologically
under-resourced. ASR systems are crucial, particularly for the hearing-impaired
persons who can benefit from having transcripts in their native languages.
However, the absence of transcribed speech datasets has complicated efforts to
develop ASR models for these indigenous languages. This paper explores the
transcription process and the development of a Kiswahili speech corpus, which
includes both read-out texts and spontaneous speech data from native Kiswahili
speakers. The study also discusses the vowels and consonants in Kiswahili and
provides an updated Kiswahili phoneme dictionary for the ASR model that was
created using the CMU Sphinx speech recognition toolbox, an open-source speech
recognition toolkit. The ASR model was trained using an extended phonetic set
that yielded a WER and SER of 18.87% and 49.5%, respectively, an improved
performance than previous similar research for under-resourced languages.
Related papers
- VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka [10.784402571965867]
VoxHakka is a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan.
VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis.
arXiv Detail & Related papers (2024-09-03T02:37:34Z) - Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach [0.6445605125467574]
This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks.
The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments.
We propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training.
arXiv Detail & Related papers (2024-06-03T15:38:40Z) - Model Adaptation for ASR in low-resource Indian Languages [28.02064068964355]
Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models like wav2vec2 and large-scale multi-lingual training like Whisper.
A huge challenge still exists for low-resource languages where the availability of both audio and text is limited.
This is where a lot of adaptation and fine-tuning techniques can be applied to overcome the low-resource nature of the data by utilising well-resourced similar languages.
It could be the case that an abundance of acoustic data in a language reduces the need for large text-only corpora
arXiv Detail & Related papers (2023-07-16T05:25:51Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - Building African Voices [125.92214914982753]
This paper focuses on speech synthesis for low-resourced African languages.
We create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources.
We release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
arXiv Detail & Related papers (2022-07-01T23:28:16Z) - ASR data augmentation in low-resource settings using cross-lingual
multi-speaker TTS and cross-lingual voice conversion [49.617722668505834]
We show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training.
It is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
arXiv Detail & Related papers (2022-03-29T11:55:30Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Automatic Speech Recognition Datasets in Cantonese Language: A Survey
and a New Dataset [85.52036362232688]
Our dataset consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong.
It combines philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics.
We create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.
arXiv Detail & Related papers (2022-01-07T12:09:15Z) - Fast Development of ASR in African Languages using Self Supervised
Speech Representation Learning [13.7466513616362]
This paper describes the results of an informal collaboration launched during the African Master of Machine Intelligence (AMMI) in June 2020.
After a series of lectures and labs on speech data collection using mobile applications, a small group of students and the lecturer continued working on automatic speech recognition (ASR) project for three languages: Wolof, Ga, and Somali.
This paper describes how data was collected and ASR systems developed with a small amount (1h) of transcribed speech as training data.
arXiv Detail & Related papers (2021-03-16T11:37:03Z) - OkwuGb\'e: End-to-End Speech Recognition for Fon and Igbo [0.015863809575305417]
We present a state-of-art ASR model for Fon, as well as benchmark ASR model results for Igbo.
We conduct a comprehensive linguistic analysis of each language and describe the creation of end-to-end, deep neural network-based speech recognition models for both languages.
arXiv Detail & Related papers (2021-03-13T18:02:44Z) - LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition [148.43282526983637]
We develop LRSpeech, a TTS and ASR system for languages with low data cost.
We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech.
We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.
arXiv Detail & Related papers (2020-08-09T08:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.