CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource
Languages
- URL: http://arxiv.org/abs/2306.10097v1
- Date: Fri, 16 Jun 2023 17:17:06 GMT
- Title: CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource
Languages
- Authors: Frederico S. Oliveira, Edresson Casanova, Arnaldo C\^andido J\'unior,
Anderson S. Soares, and Arlindo R. Galv\~ao Filho
- Abstract summary: CML-TTS is a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG)
CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Portuguese, Polish, and Spanish.
We provide the YourTTS model, a multi-lingual TTS model, trained using 3,176.13 hours from CML-TTS and also with 245.07 hours from LibriTTS, in English.
- Score: 0.769672852567215
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we present CML-TTS, a recursive acronym for
CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the
Center of Excellence in Artificial Intelligence (CEIA) of the Federal
University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS)
and adapted for training TTS models, consisting of audiobooks in seven
languages: Dutch, French, German, Italian, Portuguese, Polish, and Spanish.
Additionally, we provide the YourTTS model, a multi-lingual TTS model, trained
using 3,176.13 hours from CML-TTS and also with 245.07 hours from LibriTTS, in
English. Our purpose in creating this dataset is to open up new research
possibilities in the TTS area for multi-lingual models. The dataset is publicly
available under the CC-BY 4.0 license1.
Related papers
- Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - HLTCOE at TREC 2023 NeuCLIR Track [10.223578525761617]
The HLT team applied PLAID, an mT5 reranker, and document translation to the TREC 2023 NeuCLIR track.
For PLAID we included a variety of models and training techniques -- the English model released with ColBERT v2, translate-train(TT), Translate Distill(TD) and translate multilingual-train(MTT)
arXiv Detail & Related papers (2024-04-11T20:46:18Z) - Translation-Enhanced Multilingual Text-to-Image Generation [61.41730893884428]
Research on text-to-image generation (TTI) still predominantly focuses on the English language.
In this work, we thus investigate multilingual TTI and the current potential of neural machine translation (NMT) to bootstrap mTTI systems.
We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework.
arXiv Detail & Related papers (2023-05-30T17:03:52Z) - Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
Unsupervised Text Pretraining [65.30528567491984]
This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language.
The use of text-only data allows the development of TTS systems for low-resource languages.
Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language.
arXiv Detail & Related papers (2023-01-30T00:53:50Z) - MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
Dataset [19.086710703808794]
Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide.
We make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for the benefit of related researchers.
In this work, we prepare the transcription from various topics and invite three professional Mongolian announcers to form a three-speaker TTS dataset, in which each announcer records 10 hours of speeches in Mongolian, resulting 30 hours in total.
arXiv Detail & Related papers (2022-12-11T14:55:02Z) - Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
Learning for Text-To-Speech [37.942466944970704]
This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models.
To train a TTS model from various types of speech and text data, different training schemes are designed to handle supervised (TTS and ASR data) and unsupervised (untranscribed speech and unspoken text) datasets.
Experimental evaluation shows that multilingual TTS models trained on Virtuoso can achieve significantly better naturalness and intelligibility than baseline ones in seen languages.
arXiv Detail & Related papers (2022-10-27T14:09:48Z) - Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding [55.989376102986654]
This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech problem under the few-shot setting.
We propose a framework that consists of a phoneme-based TTS model and a codebook module to project phonemes from different languages into a learned latent space.
arXiv Detail & Related papers (2022-06-27T11:24:40Z) - Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural
Machine Translation [74.158365847236]
SixT++ is a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages.
It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively.
arXiv Detail & Related papers (2021-10-16T10:59:39Z) - MLS: A Large-Scale Multilingual Dataset for Speech Research [37.803100082550294]
The dataset is derived from read audiobooks from LibriVox.
It consists of 8 languages, including about 44.5K hours of English and a total of about 6K hours for other languages.
arXiv Detail & Related papers (2020-12-07T01:53:45Z) - CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus [57.641761472372814]
CoVoST is a multilingual speech-to-text translation corpus from 11 languages into English.
It diversified with over 11,000 speakers and over 60 accents.
CoVoST is released under CC0 license and free to use.
arXiv Detail & Related papers (2020-02-04T14:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.