Building African Voices
- URL: http://arxiv.org/abs/2207.00688v1
- Date: Fri, 1 Jul 2022 23:28:16 GMT
- Title: Building African Voices
- Authors: Perez Ogayo, Graham Neubig, Alan W Black
- Abstract summary: This paper focuses on speech synthesis for low-resourced African languages.
We create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources.
We release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
- Score: 125.92214914982753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern speech synthesis techniques can produce natural-sounding speech given
sufficient high-quality data and compute resources. However, such data is not
readily available for many languages. This paper focuses on speech synthesis
for low-resourced African languages, from corpus creation to sharing and
deploying the Text-to-Speech (TTS) systems. We first create a set of
general-purpose instructions on building speech synthesis systems with minimum
technological resources and subject-matter expertise. Next, we create new
datasets and curate datasets from "found" data (existing recordings) through a
participatory approach while considering accessibility, quality, and breadth.
We demonstrate that we can develop synthesizers that generate intelligible
speech with 25 minutes of created speech, even when recorded in suboptimal
environments. Finally, we release the speech data, code, and trained voices for
12 African languages to support researchers and developers.
Related papers
- Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.
Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.
We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z) - 1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis [1.7606944034136094]
Afro-TTS is the first pan-African English accented speech synthesis system.
Speaker retains naturalness and accentedness, enabling the creation of new voices.
arXiv Detail & Related papers (2024-06-17T16:46:10Z) - Meta Learning Text-to-Speech Synthesis in over 7000 Languages [29.17020696379219]
In this work, we take on the challenging task of building a single text-to-speech synthesis system capable of generating speech in over 7000 languages.
By leveraging a novel integration of massively multilingual pretraining and meta learning, our approach enables zero-shot speech synthesis in languages without any available data.
We aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.
arXiv Detail & Related papers (2024-06-10T15:56:52Z) - MunTTS: A Text-to-Speech System for Mundari [18.116359188623832]
We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family.
Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system.
arXiv Detail & Related papers (2024-01-28T06:27:17Z) - Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation [65.13824257448564]
This paper proposes a textless training method for many-to-many multilingual speech-to-speech translation.
By treating the speech units as pseudo-text, we can focus on the linguistic content of the speech.
We demonstrate that the proposed UTUT model can be effectively utilized not only for Speech-to-Speech Translation (S2ST) but also for multilingual Text-to-Speech Synthesis (T2S) and Text-to-Speech Translation (T2ST)
arXiv Detail & Related papers (2023-08-03T15:47:04Z) - On decoder-only architecture for speech-to-text and large language model
integration [59.49886892602309]
Speech-LLaMA is a novel approach that effectively incorporates acoustic information into text-based large language models.
We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines.
arXiv Detail & Related papers (2023-07-08T06:47:58Z) - Multilingual Multiaccented Multispeaker TTS with RADTTS [21.234787964238645]
We present a multilingual, multiaccented, multispeaker speech synthesis model based on RADTTS.
We demonstrate an ability to control synthesized accent for any speaker in an open-source dataset comprising of 7 accents.
arXiv Detail & Related papers (2023-01-24T22:39:04Z) - Automatic Speech Recognition Datasets in Cantonese Language: A Survey
and a New Dataset [85.52036362232688]
Our dataset consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong.
It combines philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics.
We create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.
arXiv Detail & Related papers (2022-01-07T12:09:15Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.