MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and
Accompanied Baseline
- URL: http://arxiv.org/abs/2209.10848v1
- Date: Thu, 22 Sep 2022 08:24:43 GMT
- Title: MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and
Accompanied Baseline
- Authors: Yifan Hu, Pengkai Yin, Rui Liu, Feilong Bao and Guanglai Gao
- Abstract summary: This paper introduces a high-quality open-source text-to-speech dataset for Mongolian, a low-resource language spoken by over 10 million people worldwide.
The dataset, named MnTTS, consists of about 8 hours of transcribed audio recordings spoken by a 22-year-old professional female Mongolian announcer.
It is the first publicly available dataset developed to promote Mongolian TTS applications in both academia and industry.
- Score: 16.95694149810552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a high-quality open-source text-to-speech (TTS)
synthesis dataset for Mongolian, a low-resource language spoken by over 10
million people worldwide. The dataset, named MnTTS, consists of about 8 hours
of transcribed audio recordings spoken by a 22-year-old professional female
Mongolian announcer. It is the first publicly available dataset developed to
promote Mongolian TTS applications in both academia and industry. In this
paper, we share our experience by describing the dataset development procedures
and faced challenges. To demonstrate the reliability of our dataset, we built a
powerful non-autoregressive baseline system based on FastSpeech2 model and
HiFi-GAN vocoder, and evaluated it using the subjective mean opinion score
(MOS) and real time factor (RTF) metrics. Evaluation results show that the
powerful baseline system trained on our dataset achieves MOS above 4 and RTF
about $3.30\times10^{-1}$, which makes it applicable for practical use. The
dataset, training recipe, and pretrained TTS models are freely available
\footnote{\label{github}\url{https://github.com/walker-hyf/MnTTS}}.
Related papers
- Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS [0.0]
This research introduces a comprehensive Bahasa text-to-speech dataset and a novel TTS model, EnGen-TTS.
The proposed EnGen-TTS model performs better than established baselines, achieving a Mean Opinion Score (MOS) of 4.45 $pm$ 0.13.
This research marks a significant advancement in Bahasa TTS technology, with implications for diverse language applications.
arXiv Detail & Related papers (2024-10-09T07:01:05Z) - SpoofCeleb: Speech Deepfake Detection and SASV In The Wild [76.71096751337888]
SpoofCeleb is a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV)
We utilize source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data.
SpoofCeleb comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions.
arXiv Detail & Related papers (2024-09-18T23:17:02Z) - Text-To-Speech Synthesis In The Wild [76.71096751337888]
Text-to-speech (TTS) systems are traditionally trained using modest databases of studio-quality, prompted or read speech collected in benign acoustic environments such as anechoic rooms.
We introduce the TTS In the Wild (TITW) dataset, the result of a fully automated pipeline, applied to the VoxCeleb1 dataset commonly used for speaker recognition.
We show that a number of recent TTS models can be trained successfully using TITW-Easy, but that it remains extremely challenging to produce similar results using TITW-Hard.
arXiv Detail & Related papers (2024-09-13T10:58:55Z) - IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS [0.9092013845117769]
IndicVoices-R (IV-R) is the largest multilingual Indian TTS dataset derived from an ASR dataset.
IV-R matches the quality of gold-standard TTS datasets like LJ,Speech LibriTTS, and IndicTTS.
We release the first TTS model for all 22 official Indian languages.
arXiv Detail & Related papers (2024-09-09T06:28:47Z) - Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
Unsupervised Text Pretraining [65.30528567491984]
This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language.
The use of text-only data allows the development of TTS systems for low-resource languages.
Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language.
arXiv Detail & Related papers (2023-01-30T00:53:50Z) - MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
Dataset [19.086710703808794]
Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide.
We make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for the benefit of related researchers.
In this work, we prepare the transcription from various topics and invite three professional Mongolian announcers to form a three-speaker TTS dataset, in which each announcer records 10 hours of speeches in Mongolian, resulting 30 hours in total.
arXiv Detail & Related papers (2022-12-11T14:55:02Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset [4.542831770689362]
This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide.
The dataset consists of about 91 hours of transcribed audio recordings spoken by two professional speakers.
It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech applications in both academia and industry.
arXiv Detail & Related papers (2021-04-17T05:49:57Z) - Facebook AI's WMT20 News Translation Task Submission [69.92594751788403]
This paper describes Facebook AI's submission to WMT20 shared news translation task.
We focus on the low resource setting and participate in two language pairs, Tamil -> English and Inuktitut -> English.
We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain.
arXiv Detail & Related papers (2020-11-16T21:49:00Z) - A Sentence Cloze Dataset for Chinese Machine Reading Comprehension [64.07894249743767]
We propose a new task called Sentence Cloze-style Machine Reading (SC-MRC)
The proposed task aims to fill the right candidate sentence into the passage that has several blanks.
We built a Chinese dataset called CMRC 2019 to evaluate the difficulty of the SC-MRC task.
arXiv Detail & Related papers (2020-04-07T04:09:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.