Dehumanizing Voice Technology: Phonetic & Experiential Consequences of
Restricted Human-Machine Interaction
- URL: http://arxiv.org/abs/2111.01934v1
- Date: Tue, 2 Nov 2021 22:49:25 GMT
- Title: Dehumanizing Voice Technology: Phonetic & Experiential Consequences of
Restricted Human-Machine Interaction
- Authors: Christian Hildebrand, Donna Hoffman, Tom Novak
- Abstract summary: We show that requests lead to an in-crease in phonetic convergence and lower phonetic latency, and ultimately a more natural task experience for consumers.
We provide evidence that altering the required input to initiate a conversation with smart objects provokes systematic changes both in terms of consumers' subjective experience and objective phonetic changes in the human voice.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of natural language and voice-based interfaces gradu-ally transforms
how consumers search, shop, and express their preferences. The current work
explores how changes in the syntactical structure of the interaction with
conversational interfaces (command vs. request based expression modalities)
negatively affects consumers' subjective task enjoyment and systematically
alters objective vocal features in the human voice. We show that requests (vs.
commands) lead to an in-crease in phonetic convergence and lower phonetic
latency, and ultimately a more natural task experience for consumers. To the
best of our knowledge, this is the first work docu-menting that altering the
input modality of how consumers interact with smart objects systematically
affects consumers' IoT experience. We provide evidence that altering the
required input to initiate a conversation with smart objects provokes
systematic changes both in terms of consumers' subjective experience and
objective phonetic changes in the human voice. The current research also makes
a methodological con-tribution by highlighting the unexplored potential of
feature extraction in human voice as a novel data format linking consumers'
vocal features during speech formation and their sub-jective task experiences.
Related papers
- Enhancing Personality Recognition in Dialogue by Data Augmentation and
Heterogeneous Conversational Graph Networks [30.33718960981521]
Personality recognition is useful for enhancing robots' ability to tailor user-adaptive responses.
One of the challenges in this task is a limited number of speakers in existing dialogue corpora.
arXiv Detail & Related papers (2024-01-11T12:27:33Z) - End-to-End Continuous Speech Emotion Recognition in Real-life Customer
Service Call Center Conversations [0.0]
We present our approach to constructing a large-scale reallife dataset (CusEmo) for continuous SER in customer service call center conversations.
We adopted the dimensional emotion annotation approach to capture the subtlety, complexity, and continuity of emotions in real-life call center conversations.
The study also addresses the challenges encountered during the application of the End-to-End (E2E) SER system to the dataset.
arXiv Detail & Related papers (2023-10-02T11:53:48Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Interactive Conversational Head Generation [68.76774230274076]
We introduce a new conversation head generation benchmark for synthesizing behaviors of a single interlocutor in a face-to-face conversation.
The capability to automatically synthesize interlocutors which can participate in long and multi-turn conversations is vital and offer benefits for various applications.
arXiv Detail & Related papers (2023-07-05T08:06:26Z) - Visual-Aware Text-to-Speech [101.89332968344102]
We present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and visual feedback of the listener in face-to-face communication.
We devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis.
arXiv Detail & Related papers (2023-06-21T05:11:39Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Analysis and Utilization of Entrainment on Acoustic and Emotion Features
in User-agent Dialogue [8.933468765800518]
We first examine the existence of the entrainment phenomenon in human-to-human dialogues.
The analysis results show strong evidence of entrainment in terms of both acoustic and emotion features.
We implement two entrainment policies and assess if the integration of entrainment principle into a Text-to-Speech (TTS) system improves the synthesis performance and the user experience.
arXiv Detail & Related papers (2022-12-07T01:45:15Z) - Understanding How People Rate Their Conversations [73.17730062864314]
We conduct a study to better understand how people rate their interactions with conversational agents.
We focus on agreeableness and extraversion as variables that may explain variation in ratings.
arXiv Detail & Related papers (2022-06-01T00:45:32Z) - Responsive Listening Head Generation: A Benchmark Dataset and Baseline [58.168958284290156]
We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs.
Unlike speech-driven gesture or talking head generation, we introduce more modals in this task, hoping to benefit several research fields.
arXiv Detail & Related papers (2021-12-27T07:18:50Z) - Quantifying the Effects of Prosody Modulation on User Engagement and
Satisfaction in Conversational Systems [10.102799140277932]
We report results from a large-scale empirical study that measures the effects of prosodic modulation on user behavior and engagement.
Our results indicate that the prosody modulation significantly increases both immediate and overall user satisfaction.
Together, our results provide useful tools and insights for improving the naturalness of responses in conversational systems.
arXiv Detail & Related papers (2020-06-02T19:53:13Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.