Investigating the Utility of Surprisal from Large Language Models for
Speech Synthesis Prosody
- URL: http://arxiv.org/abs/2306.09814v1
- Date: Fri, 16 Jun 2023 12:49:44 GMT
- Title: Investigating the Utility of Surprisal from Large Language Models for
Speech Synthesis Prosody
- Authors: Sofoklis Kakouros, Juraj \v{S}imko, Martti Vainio, Antti Suni
- Abstract summary: This paper investigates the use of word surprisal, a measure of the predictability of a word in a given context, as a feature to aid speech prosody synthesis.
We conduct experiments using a large corpus of English text and large language models (LLMs) of varying sizes.
We find that word surprisal and word prominence are moderately correlated, suggesting that they capture related but distinct aspects of language use.
- Score: 4.081433571732691
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the use of word surprisal, a measure of the
predictability of a word in a given context, as a feature to aid speech
synthesis prosody. We explore how word surprisal extracted from large language
models (LLMs) correlates with word prominence, a signal-based measure of the
salience of a word in a given discourse. We also examine how context length and
LLM size affect the results, and how a speech synthesizer conditioned with
surprisal values compares with a baseline system. To evaluate these factors, we
conducted experiments using a large corpus of English text and LLMs of varying
sizes. Our results show that word surprisal and word prominence are moderately
correlated, suggesting that they capture related but distinct aspects of
language use. We find that length of context and size of the LLM impact the
correlations, but not in the direction anticipated, with longer contexts and
larger LLMs generally underpredicting prominent words in a nearly linear
manner. We demonstrate that, in line with these findings, a speech synthesizer
conditioned with surprisal values provides a minimal improvement over the
baseline with the results suggesting a limited effect of using surprisal values
for eliciting appropriate prominence patterns.
Related papers
- Investigating large language models for their competence in extracting grammatically sound sentences from transcribed noisy utterances [1.3597551064547497]
Humans exhibit remarkable cognitive abilities to separate semantically significant content from speech-specific noise.
We investigate whether large language models (LLMs) can effectively perform analogical speech comprehension tasks.
arXiv Detail & Related papers (2024-10-07T14:55:20Z) - Confabulation: The Surprising Value of Large Language Model Hallucinations [0.7249731529275342]
We argue that measurable semantic characteristics of LLM confabulations mirror a human propensity to utilize increased narrativity as a cognitive resource for sense-making and communication.
This finding reveals a tension in our usually dismissive understandings of confabulation.
arXiv Detail & Related papers (2024-06-06T15:32:29Z) - Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs [101.51435599249234]
We propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM)
Specifically, the axiomatic system enables us to categorize the memorization effects into foundational memorization effects and chaotic memorization effects.
Experiments show that the clear disentanglement of memorization effects and in-context reasoning effects enables a straightforward examination of detailed inference patterns encoded by LLMs.
arXiv Detail & Related papers (2024-05-20T08:51:03Z) - Word Importance Explains How Prompts Affect Language Model Outputs [0.7223681457195862]
This study presents a method to improve the explainability of large language models by varying individual words in prompts.
Unlike classical attention, word importance measures the impact of prompt words on arbitrarily-defined text scores.
Results show that word importance scores are closely related to the expected suffix importances for multiple scoring functions.
arXiv Detail & Related papers (2024-03-05T15:04:18Z) - Comparing Hallucination Detection Metrics for Multilingual Generation [62.97224994631494]
This paper assesses how well various factual hallucination detection metrics identify hallucinations in generated biographical summaries across languages.
We compare how well automatic metrics correlate to each other and whether they agree with human judgments of factuality.
Our analysis reveals that while the lexical metrics are ineffective, NLI-based metrics perform well, correlating with human annotations in many settings and often outperforming supervised models.
arXiv Detail & Related papers (2024-02-16T08:10:34Z) - Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study [3.0059120458540383]
We consider the evaluation of the lexical richness of the text generated by conversational Large Language Models (LLMs) and how it depends on the model parameters.
The results show how lexical richness depends on the version of ChatGPT and some of its parameters, such as the presence penalty, or on the role assigned to the model.
arXiv Detail & Related papers (2024-02-11T13:41:17Z) - Zero-shot Causal Graph Extrapolation from Text via LLMs [50.596179963913045]
We evaluate the ability of large language models (LLMs) to infer causal relations from natural language.
LLMs show competitive performance in a benchmark of pairwise relations without needing (explicit) training samples.
We extend our approach to extrapolating causal graphs through iterated pairwise queries.
arXiv Detail & Related papers (2023-12-22T13:14:38Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Evaluating statistical language models as pragmatic reasoners [39.72348730045737]
We evaluate the capacity of large language models to infer meanings of pragmatic utterances.
We find that LLMs can derive context-grounded, human-like distributions over the interpretations of several complex pragmatic utterances.
Results inform the inferential capacity of statistical language models, and their use in pragmatic and semantic parsing applications.
arXiv Detail & Related papers (2023-05-01T18:22:10Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.