Prediction of Listener Perception of Argumentative Speech in a
Crowdsourced Data Using (Psycho-)Linguistic and Fluency Features
- URL: http://arxiv.org/abs/2111.07130v1
- Date: Sat, 13 Nov 2021 15:07:13 GMT
- Title: Prediction of Listener Perception of Argumentative Speech in a
Crowdsourced Data Using (Psycho-)Linguistic and Fluency Features
- Authors: Yu Qiao, Sourabh Zanwar, Rishab Bhattacharyya, Daniel Wiechmann, Wei
Zhou, Elma Kerz, Ralf Schl\"uter
- Abstract summary: We aim to predict TED talk-style affective ratings in a crowdsourced dataset of argumentative speech.
We present an effective approach to the classification task of predicting these categories through fine-tuning a model pre-trained on a large dataset of TED talks public speeches.
- Score: 24.14001104126045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key communicative competencies is the ability to maintain fluency
in monologic speech and the ability to produce sophisticated language to argue
a position convincingly. In this paper we aim to predict TED talk-style
affective ratings in a crowdsourced dataset of argumentative speech consisting
of 7 hours of speech from 110 individuals. The speech samples were elicited
through task prompts relating to three debating topics. The samples received a
total of 2211 ratings from 737 human raters pertaining to 14 affective
categories. We present an effective approach to the classification task of
predicting these categories through fine-tuning a model pre-trained on a large
dataset of TED talks public speeches. We use a combination of fluency features
derived from a state-of-the-art automatic speech recognition system and a large
set of human-interpretable linguistic features obtained from an automatic text
analysis system. Classification accuracy was greater than 60% for all 14 rating
categories, with a peak performance of 72% for the rating category
'informative'. In a secondary experiment, we determined the relative importance
of features from different groups using SP-LIME.
Related papers
- Speechworthy Instruction-tuned Language Models [71.8586707840169]
We show that both prompting and preference learning increase the speech-suitability of popular instruction-tuned LLMs.
We share lexical, syntactical, and qualitative analyses to showcase how each method contributes to improving the speech-suitability of generated responses.
arXiv Detail & Related papers (2024-09-23T02:34:42Z) - CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors.
We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models.
In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z) - Estimating Contribution Quality in Online Deliberations Using a Large Language Model [4.911986505938227]
We use a large language model (LLM) alongside eight human annotators to rate contributions based on justification, novelty, expansion of the conversation, and potential for further expansion.
Using the average rating from other human annotators as the ground truth, we find the model outperforms individual human annotators.
We illustrate the usefulness of the automated quality rating by assessing the effect of nudges on the quality of deliberation.
arXiv Detail & Related papers (2024-08-21T18:41:32Z) - EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation [83.29199726650899]
The EARS dataset comprises 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data.
The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech.
We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics.
arXiv Detail & Related papers (2024-06-10T11:28:29Z) - Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT)
Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework.
We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Personalized Automatic Speech Recognition Trained on Small Disordered
Speech Datasets [0.0]
We trained personalized models for 195 individuals with different types and severities of speech impairment.
For the home automation scenario, 79% of speakers reached the target WER with 18-20 minutes of speech; but even with only 3-4 minutes of speech, 63% of speakers reached the target WER.
arXiv Detail & Related papers (2021-10-09T17:11:17Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z) - Comparing Supervised Models And Learned Speech Representations For
Classifying Intelligibility Of Disordered Speech On Selected Phrases [11.3463024120429]
We develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases.
We collected samples from a diverse set of 661 speakers with a variety of self-reported disorders speaking 29 words or phrases.
arXiv Detail & Related papers (2021-07-08T17:24:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.