KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data,
Speakers, and Topics
- URL: http://arxiv.org/abs/2201.05771v1
- Date: Sat, 15 Jan 2022 06:54:30 GMT
- Title: KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data,
Speakers, and Topics
- Authors: Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol
- Abstract summary: We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus.
In the new KazakhTTS2 corpus, the overall size is increased from 93 hours to 271 hours.
The number of speakers has risen from two to five (three females and two males), and the topic coverage is diversified with the help of new sources, including a book and Wikipedia articles.
- Score: 4.859986264602551
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an expanded version of our previously released Kazakh
text-to-speech (KazakhTTS) synthesis corpus. In the new KazakhTTS2 corpus, the
overall size is increased from 93 hours to 271 hours, the number of speakers
has risen from two to five (three females and two males), and the topic
coverage is diversified with the help of new sources, including a book and
Wikipedia articles. This corpus is necessary for building high-quality TTS
systems for Kazakh, a Central Asian agglutinative language from the Turkic
family, which presents several linguistic challenges. We describe the corpus
construction process and provide the details of the training and evaluation
procedures for the TTS system. Our experimental results indicate that the
constructed corpus is sufficient to build robust TTS models for real-world
applications, with a subjective mean opinion score of above 4.0 for all the
five speakers. We believe that our corpus will facilitate speech and language
research for Kazakh and other Turkic languages, which are widely considered to
be low-resource due to the limited availability of free linguistic data. The
constructed corpus, code, and pretrained models are publicly available in our
GitHub repository.
Related papers
- SeamlessM4T: Massively Multilingual & Multimodal Machine Translation [90.71078166159295]
We introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-text translation, and automatic speech recognition for up to 100 languages.
We developed the first multilingual system capable of translating from and into English for both speech and text.
On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation.
arXiv Detail & Related papers (2023-08-22T17:44:18Z) - Multilingual Text-to-Speech Synthesis for Turkic Languages Using
Transliteration [3.0122461286351796]
This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages.
We specifically target the zero-shot learning scenario, where a TTS model trained using the data of one language is applied to synthesise speech for other, unseen languages.
An end-to-end TTS system based on the Tacotron 2 architecture was trained using only the available data of the Kazakh language.
arXiv Detail & Related papers (2023-05-25T05:57:54Z) - Textless Speech-to-Speech Translation With Limited Parallel Data [51.3588490789084]
PFB is a framework for training textless S2ST models that require just dozens of hours of parallel speech data.
We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains.
arXiv Detail & Related papers (2023-05-24T17:59:05Z) - ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus [3.1925030748447747]
We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic.
The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated.
The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.
arXiv Detail & Related papers (2023-02-28T20:18:59Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems.
We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary.
Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z) - Kosp2e: Korean Speech to English Translation Corpus [11.44330742875498]
We introduce kosp2e, a corpus that allows Korean speech to be translated into English text in an end-to-end manner.
We adopt open license speech recognition corpus, translation corpus, and spoken language corpora to make our dataset freely available to the public.
arXiv Detail & Related papers (2021-07-06T20:34:06Z) - KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset [4.542831770689362]
This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide.
The dataset consists of about 91 hours of transcribed audio recordings spoken by two professional speakers.
It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech applications in both academia and industry.
arXiv Detail & Related papers (2021-04-17T05:49:57Z) - Large-Scale Self- and Semi-Supervised Learning for Speech Translation [48.06478781295623]
We explore both pretraining and self-training by using the large Libri-Light speech audio corpus and language modeling with CommonCrawl.
Our experiments improve over the previous state of the art by 2.6 BLEU on average on all four considered CoVoST 2 language pairs.
arXiv Detail & Related papers (2021-04-14T07:44:52Z) - A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech
Recognition Baseline [4.521450956414864]
The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups.
The KSC is the largest publicly available database developed to advance various Kazakh speech and language processing applications.
arXiv Detail & Related papers (2020-09-22T05:57:15Z) - CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus [57.641761472372814]
CoVoST is a multilingual speech-to-text translation corpus from 11 languages into English.
It diversified with over 11,000 speakers and over 60 accents.
CoVoST is released under CC0 license and free to use.
arXiv Detail & Related papers (2020-02-04T14:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.