Related papers: KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

URL: http://arxiv.org/abs/2104.08459v1
Date: Sat, 17 Apr 2021 05:49:57 GMT
Title: KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset
Authors: Saida Mussakhojayeva, Aigerim Janaliyeva, Almas Mirzakhmetov, Yerbolat Khassanov, Huseyin Atakan Varol
Abstract summary: This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide. The dataset consists of about 91 hours of transcribed audio recordings spoken by two professional speakers. It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech applications in both academia and industry.
Score: 4.542831770689362
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide. The dataset consists of about 91 hours of transcribed audio recordings spoken by two professional speakers (female and male). It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech (TTS) applications in both academia and industry. In this paper, we share our experience by describing the dataset development procedures and faced challenges, and discuss important future directions. To demonstrate the reliability of our dataset, we built baseline end-to-end TTS models and evaluated them using the subjective mean opinion score (MOS) measure. Evaluation results show that the best TTS models trained on our dataset achieve MOS above 4 for both speakers, which makes them applicable for practical use. The dataset, training recipe, and pretrained TTS models are freely available.

Related papers

KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization [57.08591486199925]
This paper presents KIT's submissions to the IWSLT 2025 low-resource track.<n>We develop both cascaded systems, and end-to-end (E2E) Speech Translation systems.<n>Building upon pre-trained models, we fine-tune our systems with different strategies to utilize resources efficiently.
arXiv Detail & Related papers (2025-05-26T08:38:02Z)
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS [0.0]
This research introduces a comprehensive Bahasa text-to-speech dataset and a novel TTS model, EnGen-TTS. The proposed EnGen-TTS model performs better than established baselines, achieving a Mean Opinion Score (MOS) of 4.45 $pm$ 0.13. This research marks a significant advancement in Bahasa TTS technology, with implications for diverse language applications.
arXiv Detail & Related papers (2024-10-09T07:01:05Z)
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild [76.71096751337888]
SpoofCeleb is a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV) We utilize source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. SpoofCeleb comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions.
arXiv Detail & Related papers (2024-09-18T23:17:02Z)
Text-To-Speech Synthesis In The Wild [76.71096751337888]
Text-to-speech (TTS) systems are traditionally trained using modest databases of studio-quality, prompted or read speech collected in benign acoustic environments such as anechoic rooms. We introduce the TTS In the Wild (TITW) dataset, the result of a fully automated pipeline, applied to the VoxCeleb1 dataset commonly used for speaker recognition. We show that a number of recent TTS models can be trained successfully using TITW-Easy, but that it remains extremely challenging to produce similar results using TITW-Hard.
arXiv Detail & Related papers (2024-09-13T10:58:55Z)
IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS [0.9092013845117769]
IndicVoices-R (IV-R) is the largest multilingual Indian TTS dataset derived from an ASR dataset. IV-R matches the quality of gold-standard TTS datasets like LJ,Speech LibriTTS, and IndicTTS. We release the first TTS model for all 22 official Indian languages.
arXiv Detail & Related papers (2024-09-09T06:28:47Z)
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments [8.103855990028842]
We introduce Synth4Kws - a framework to leverage Text to Speech (TTS) synthesized data for custom KWS. We found increasing TTS phrase diversity and utterance sampling monotonically improves model performance. Our experiments are based on English and single word utterances but the findings generalize to i18n languages.
arXiv Detail & Related papers (2024-07-23T21:05:44Z)
Towards Zero-Shot Text-To-Speech for Arabic Dialects [16.10882912169842]
Zero-shot multi-speaker text-to-speech (ZS-TTS) systems have advanced for English, however, it still lags behind due to insufficient resources. We address this gap for Arabic by first adapting an existing dataset to suit the needs of speech synthesis. We employ a set of Arabic dialect identification models to explore the impact of pre-defined dialect labels on improving the ZS-TTS model in a multi-dialect setting.
arXiv Detail & Related papers (2024-06-24T15:58:15Z)
ArmanTTS single-speaker Persian dataset [2.7836084563851284]
This paper introduces the single speaker dataset: ArmanTTS. We show that ArmanTTS meets the necessary standards for teaching a Persian text-to-speech conversion model. We also combined the Tacotron 2 and HiFi GAN to design a model that can receive phonemes as input, with the output being the corresponding speech.
arXiv Detail & Related papers (2023-04-07T10:52:55Z)
Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language. We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z)
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline [16.95694149810552]
This paper introduces a high-quality open-source text-to-speech dataset for Mongolian, a low-resource language spoken by over 10 million people worldwide. The dataset, named MnTTS, consists of about 8 hours of transcribed audio recordings spoken by a 22-year-old professional female Mongolian announcer. It is the first publicly available dataset developed to promote Mongolian TTS applications in both academia and industry.
arXiv Detail & Related papers (2022-09-22T08:24:43Z)
ASR2K: Speech Recognition for Around 2000 Languages without Audio [100.41158814934802]
We present a speech recognition pipeline that does not require any audio for the target language. Our pipeline consists of three components: acoustic, pronunciation, and language models. We build speech recognition for 1909 languages by combining it with Crubadan: a large endangered languages n-gram database.
arXiv Detail & Related papers (2022-09-06T22:48:29Z)
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis [50.236929707024245]
The SOMOS dataset is the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset.
arXiv Detail & Related papers (2022-04-06T18:45:20Z)
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus [57.641761472372814]
CoVoST is a multilingual speech-to-text translation corpus from 11 languages into English. It diversified with over 11,000 speakers and over 60 accents. CoVoST is released under CC0 license and free to use.
arXiv Detail & Related papers (2020-02-04T14:35:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.