MunTTS: A Text-to-Speech System for Mundari
- URL: http://arxiv.org/abs/2401.15579v1
- Date: Sun, 28 Jan 2024 06:27:17 GMT
- Title: MunTTS: A Text-to-Speech System for Mundari
- Authors: Varun Gumma, Rishav Hada, Aditya Yadavalli, Pamir Gogoi, Ishani
Mondal, Vivek Seshadri, Kalika Bali
- Abstract summary: We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family.
Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system.
- Score: 18.116359188623832
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for
Mundari, a low-resource Indian language of the Austo-Asiatic family. Our work
addresses the gap in linguistic technology for underrepresented languages by
collecting and processing data to build a speech synthesis system. We begin our
study by gathering a substantial dataset of Mundari text and speech and train
end-to-end speech models. We also delve into the methods used for training our
models, ensuring they are efficient and effective despite the data constraints.
We evaluate our system with native speakers and objective metrics,
demonstrating its potential as a tool for preserving and promoting the Mundari
language in the digital age.
Related papers
- Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction.
Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese.
We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z) - MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and
Phonetic Domains for Speech Representation Learning [0.76146285961466]
We present a methodology for linguistic feature extraction, focusing on automatically syllabifying words in multiple languages.
In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification.
The system was built with open-source components and resources.
arXiv Detail & Related papers (2023-10-17T19:27:23Z) - AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation.
AudioPaLM fuses text-based and speech-based language models.
It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Towards Building Text-To-Speech Systems for the Next Billion Users [18.290165216270452]
We evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages.
We train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores.
arXiv Detail & Related papers (2022-11-17T13:59:34Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Low-Resource Multilingual and Zero-Shot Multispeaker TTS [25.707717591185386]
We show that it is possible for a system to learn speaking a new language using just 5 minutes of training data.
We show the success of our proposed approach in terms of intelligibility, naturalness and similarity to target speaker.
arXiv Detail & Related papers (2022-10-21T20:03:37Z) - Building African Voices [125.92214914982753]
This paper focuses on speech synthesis for low-resourced African languages.
We create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources.
We release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
arXiv Detail & Related papers (2022-07-01T23:28:16Z) - KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian
Languages Code-Switching Challenge [7.711092265101041]
We present the Kanari/QCRI system and the modeling strategies used to participate in the Interspeech 2021 Code-switching (CS) challenge for low-resource Indian languages.
The subtask involved developing a speech recognition system for two CS datasets: Hindi-English and Bengali-English.
To tackle the CS challenges, we use transfer learning for incorporating the publicly available monolingual Hindi, Bengali, and English speech data.
arXiv Detail & Related papers (2021-06-10T16:12:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.