A Comparative Analysis of Pretrained Language Models for Text-to-Speech
- URL: http://arxiv.org/abs/2309.01576v1
- Date: Mon, 4 Sep 2023 13:02:27 GMT
- Title: A Comparative Analysis of Pretrained Language Models for Text-to-Speech
- Authors: Marcel Granero-Moya, Penny Karanasou, Sri Karlapati, Bastian Schnell,
Nicole Peinelt, Alexis Moinet, Thomas Drugman
- Abstract summary: State-of-the-art text-to-speech (TTS) systems have utilized pretrained language models (PLMs) to enhance prosody and create more natural-sounding speech.
While PLMs have been extensively researched for natural language understanding (NLU), their impact on TTS has been overlooked.
This study is the first study of its kind to investigate the impact of different PLMs on TTS.
- Score: 13.962029761484022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art text-to-speech (TTS) systems have utilized pretrained
language models (PLMs) to enhance prosody and create more natural-sounding
speech. However, while PLMs have been extensively researched for natural
language understanding (NLU), their impact on TTS has been overlooked. In this
study, we aim to address this gap by conducting a comparative analysis of
different PLMs for two TTS tasks: prosody prediction and pause prediction.
Firstly, we trained a prosody prediction model using 15 different PLMs. Our
findings revealed a logarithmic relationship between model size and quality, as
well as significant performance differences between neutral and expressive
prosody. Secondly, we employed PLMs for pause prediction and found that the
task was less sensitive to small models. We also identified a strong
correlation between our empirical results and the GLUE scores obtained for
these language models. To the best of our knowledge, this is the first study of
its kind to investigate the impact of different PLMs on TTS.
Related papers
- An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study [0.0]
Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP)
This study addresses the underexplored area of evaluating Large Language Models (LLMs) in low-resourced languages like Bengali.
arXiv Detail & Related papers (2024-05-05T13:57:05Z) - Heuristic-enhanced Candidates Selection strategy for GPTs tackle Few-Shot Aspect-Based Sentiment Analysis [1.5020330976600738]
The paper designs a Heuristic-enhanced Candidates Selection strategy and further proposes All in One (AiO) model based on it.
The model works in a two-stage, which simultaneously accommodates the accuracy of PLMs and the capability of generalization.
The experimental results demonstrate that the proposed model can better adapt to multiple sub-tasks, and also outperforms the methods that directly utilize GPTs.
arXiv Detail & Related papers (2024-04-09T07:02:14Z) - Large GPT-like Models are Bad Babies: A Closer Look at the Relationship
between Linguistic Competence and Psycholinguistic Measures [25.210837736795565]
We train a series of GPT-like language models of different sizes on the strict version of the BabyLM pretraining corpus.
We find a positive correlation between LM size and performance on all three challenge tasks, with different preferences for model width and depth in each of the tasks.
This suggests that modelling processing effort and linguistic competence may require an approach different from training GPT-like LMs on a developmentally plausible corpus.
arXiv Detail & Related papers (2023-11-08T09:26:27Z) - Argumentative Stance Prediction: An Exploratory Study on Multimodality
and Few-Shot Learning [0.0]
We evaluate the necessity of images for stance prediction in tweets.
Our work suggests an ensemble of fine-tuned text-based language models.
Our findings suggest that the multimodal models tend to perform better when image content is summarized as natural language.
arXiv Detail & Related papers (2023-10-11T00:18:29Z) - How Does Pretraining Improve Discourse-Aware Translation? [41.20896077662125]
We introduce a probing task to interpret the ability of pretrained language models to capture discourse relation knowledge.
We validate three state-of-the-art PLMs across encoder-, decoder-, and encoder-decoder-based models.
Our findings are instructive to understand how and when discourse knowledge in PLMs should work for downstream tasks.
arXiv Detail & Related papers (2023-05-31T13:36:51Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - ElitePLM: An Empirical Study on General Language Ability Evaluation of
Pretrained Language Models [78.08792285698853]
We present a large-scale empirical study on general language ability evaluation of pretrained language models (ElitePLM)
Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; and (3) PLMs have excellent transferability between similar tasks.
arXiv Detail & Related papers (2022-05-03T14:18:10Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.