TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer
- URL: http://arxiv.org/abs/2311.08389v1
- Date: Tue, 14 Nov 2023 18:50:51 GMT
- Title: TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer
- Authors: Huashan Sun, Yixiao Wu, Yinghao Li, Jiawei Li, Yizhe Yang, Yang Gao
- Abstract summary: We introduce a novel task called Text Speech-Style Transfer (TSST)
The main objective is to explore topics related to human cognition, such as personality and emotion, based on the capabilities of existing language models.
We thoroughly analyze the performance of several large language models (LLMs) and identify areas where further improvement is needed.
- Score: 17.888328120571245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text style is highly abstract, as it encompasses various aspects of a
speaker's characteristics, habits, logical thinking, and the content they
express. However, previous text-style transfer tasks have primarily focused on
data-driven approaches, lacking in-depth analysis and research from the
perspectives of linguistics and cognitive science. In this paper, we introduce
a novel task called Text Speech-Style Transfer (TSST). The main objective is to
further explore topics related to human cognition, such as personality and
emotion, based on the capabilities of existing LLMs. Considering the objective
of our task and the distinctive characteristics of oral speech in real-life
scenarios, we trained multi-dimension (i.e. filler words, vividness,
interactivity, emotionality) evaluation models for the TSST and validated their
correlation with human assessments. We thoroughly analyze the performance of
several large language models (LLMs) and identify areas where further
improvement is needed. Moreover, driven by our evaluation models, we have
released a new corpus that improves the capabilities of LLMs in generating text
with speech-style characteristics. In summary, we present the TSST task, a new
benchmark for style transfer and emphasizing human-oriented evaluation,
exploring and advancing the performance of current LLMs.
Related papers
- BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks [2.9873893715462176]
This work introduces a novel framework named BiosERC, which investigates speaker characteristics in a conversation.
By employing Large Language Models (LLMs), we extract the "biographical information" of the speaker within a conversation.
Our proposed method achieved state-of-the-art (SOTA) results on three famous benchmark datasets.
arXiv Detail & Related papers (2024-07-05T06:25:34Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback [39.54647336161013]
We propose a sampling-annotating-learning framework tailored to text-to-speech (TTS) optimization.
We show that UNO considerably improves the zero-shot performance of TTS models in terms of MOS, word error rate, and speaker similarity.
We also present a remarkable ability of UNO that it can adapt to the desired speaking style in emotional TTS seamlessly and flexibly.
arXiv Detail & Related papers (2024-06-02T07:54:33Z) - Probing Language Models' Gesture Understanding for Enhanced Human-AI
Interaction [6.216023343793143]
This project aims to investigate the interaction between Large Language Models and non-verbal communication, specifically focusing on gestures.
The proposal sets out a plan to examine the proficiency of LLMs in deciphering both explicit and implicit non-verbal cues within textual prompts.
To assess LLMs' comprehension of gestures, experiments are planned, evaluating their ability to simulate human behaviour in order to replicate psycholinguistic experiments.
arXiv Detail & Related papers (2024-01-31T14:19:03Z) - Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT)
Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework.
We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z) - Subspace Chronicles: How Linguistic Information Emerges, Shifts and
Interacts during Language Model Training [56.74440457571821]
We analyze tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds.
We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.
Our findings have implications for model interpretability, multi-task learning, and learning from limited data.
arXiv Detail & Related papers (2023-10-25T09:09:55Z) - AI Text-to-Behavior: A Study In Steerability [0.0]
The research explores the steerability of Large Language Models (LLMs)
We quantitatively gauged the model's responsiveness to tailored prompts using a behavioral psychology framework called OCEAN.
Our findings underscore GPT's versatility and ability to discern and adapt to nuanced instructions.
arXiv Detail & Related papers (2023-08-07T18:14:24Z) - M-SENSE: Modeling Narrative Structure in Short Personal Narratives Using
Protagonist's Mental Representations [14.64546899992196]
We propose the task of automatically detecting prominent elements of the narrative structure by analyzing the role of characters' inferred mental state.
We introduce a STORIES dataset of short personal narratives containing manual annotations of key elements of narrative structure, specifically climax and resolution.
Our model is able to achieve significant improvements in the task of identifying climax and resolution.
arXiv Detail & Related papers (2023-02-18T20:48:02Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.