XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations
for Text-to-Speech
- URL: http://arxiv.org/abs/2305.19709v1
- Date: Wed, 31 May 2023 10:05:33 GMT
- Title: XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations
for Text-to-Speech
- Authors: Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen
- Abstract summary: We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task.
Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales.
- Score: 15.254598796939922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present XPhoneBERT, the first multilingual model pre-trained to learn
phoneme representations for the downstream text-to-speech (TTS) task. Our
XPhoneBERT has the same model architecture as BERT-base, trained using the
RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100
languages and locales. Experimental results show that employing XPhoneBERT as
an input phoneme encoder significantly boosts the performance of a strong
neural TTS model in terms of naturalness and prosody and also helps produce
fairly high-quality speech with limited training data. We publicly release our
pre-trained XPhoneBERT with the hope that it would facilitate future research
and downstream TTS applications for multiple languages. Our XPhoneBERT model is
available at https://github.com/VinAIResearch/XPhoneBERT
Related papers
- Pheme: Efficient and Conversational Speech Generation [52.34331755341856]
We introduce the Pheme model series that offers compact yet high-performing conversational TTS models.
It can be trained efficiently on smaller-scale conversational data, cutting data demands by more than 10x but still matching the quality of the autoregressive TTS models.
arXiv Detail & Related papers (2024-01-05T14:47:20Z) - Textless Speech-to-Speech Translation With Limited Parallel Data [51.3588490789084]
PFB is a framework for training textless S2ST models that require just dozens of hours of parallel speech data.
We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains.
arXiv Detail & Related papers (2023-05-24T17:59:05Z) - Textually Pretrained Speech Language Models [107.10344535390956]
We propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.
We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board.
arXiv Detail & Related papers (2023-05-22T13:12:16Z) - Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
Unsupervised Text Pretraining [65.30528567491984]
This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language.
The use of text-only data allows the development of TTS systems for low-resource languages.
Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language.
arXiv Detail & Related papers (2023-01-30T00:53:50Z) - Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers [92.55131711064935]
We introduce a language modeling approach for text to speech synthesis (TTS)
Specifically, we train a neural language model (called Vall-E) using discrete codes derived from an off-the-shelf neural audio model.
Vall-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech.
arXiv Detail & Related papers (2023-01-05T15:37:15Z) - TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for
Multilingual Tweet Representations at Twitter [31.698196219228024]
We present TwHIN-BERT, a multilingual language model productionized at Twitter.
Our model is trained on 7 billion tweets covering over 100 distinct languages.
We evaluate our model on various multilingual social recommendation and semantic understanding tasks.
arXiv Detail & Related papers (2022-09-15T19:01:21Z) - ASR-Generated Text for Language Model Pre-training Applied to Speech
Tasks [20.83731188652985]
We leverage the INA (French National Audiovisual Institute) collection and obtain 19GB of text after applying ASR on 350,000 hours of diverse TV shows.
New models (FlauBERT-Oral) are shared with the community and evaluated for 3 downstream tasks: spoken language understanding, classification of TV shows and speech syntactic parsing.
arXiv Detail & Related papers (2022-07-05T08:47:51Z) - Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme
Representations for Text to Speech [104.65639892109381]
We propose MixedPhoneme BERT, a novel variant of the BERT model that uses mixed phoneme and sup-phoneme representations to enhance the learning capability.
Experiment results demonstrate that our proposed Mixed-Phoneme BERT significantly improves the TTS performance with 0.30 CMOS gain compared with the FastSpeech 2 baseline.
arXiv Detail & Related papers (2022-03-31T17:12:26Z) - GottBERT: a pure German Language Model [0.0]
No German single language RoBERTa model is yet published, which we introduce in this work (GottBERT)
In an evaluation we compare its performance on the two Named Entity Recognition (NER) tasks Conll 2003 and GermEval 2014 as well as on the text classification tasks GermEval 2018 (fine and coarse) and GNAD with existing German single language BERT models and two multilingual ones.
GottBERT was successfully pre-trained on a 256 core TPU pod using the RoBERTa BASE architecture.
arXiv Detail & Related papers (2020-12-03T17:45:03Z) - PhoBERT: Pre-trained language models for Vietnamese [11.685916685552982]
We present PhoBERT, the first public large-scale monolingual language models pre-trained for Vietnamese.
Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual model XLM-R.
We release PhoBERT to facilitate future research and downstream applications for Vietnamese NLP.
arXiv Detail & Related papers (2020-03-02T10:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.