Quantifying the Effects of Prosody Modulation on User Engagement and
Satisfaction in Conversational Systems
- URL: http://arxiv.org/abs/2006.01916v1
- Date: Tue, 2 Jun 2020 19:53:13 GMT
- Title: Quantifying the Effects of Prosody Modulation on User Engagement and
Satisfaction in Conversational Systems
- Authors: Jason Ingyu Choi, Eugene Agichtein
- Abstract summary: We report results from a large-scale empirical study that measures the effects of prosodic modulation on user behavior and engagement.
Our results indicate that the prosody modulation significantly increases both immediate and overall user satisfaction.
Together, our results provide useful tools and insights for improving the naturalness of responses in conversational systems.
- Score: 10.102799140277932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As voice-based assistants such as Alexa, Siri, and Google Assistant become
ubiquitous, users increasingly expect to maintain natural and informative
conversations with such systems. However, for an open-domain conversational
system to be coherent and engaging, it must be able to maintain the user's
interest for extended periods, without sounding boring or annoying. In this
paper, we investigate one natural approach to this problem, of modulating
response prosody, i.e., changing the pitch and cadence of the response to
indicate delight, sadness or other common emotions, as well as using
pre-recorded interjections. Intuitively, this approach should improve the
naturalness of the conversation, but attempts to quantify the effects of
prosodic modulation on user satisfaction and engagement remain challenging. To
accomplish this, we report results obtained from a large-scale empirical study
that measures the effects of prosodic modulation on user behavior and
engagement across multiple conversation domains, both immediately after each
turn, and at the overall conversation level. Our results indicate that the
prosody modulation significantly increases both immediate and overall user
satisfaction. However, since the effects vary across different domains, we
verify that prosody modulations do not substitute for coherent, informative
content of the responses. Together, our results provide useful tools and
insights for improving the naturalness of responses in conversational systems.
Related papers
- Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations [17.409790984399052]
This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement.
Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues.
Results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension.
arXiv Detail & Related papers (2024-06-21T09:26:55Z) - Emotional Listener Portrait: Realistic Listener Motion Simulation in
Conversation [50.35367785674921]
Listener head generation centers on generating non-verbal behaviors of a listener in reference to the information delivered by a speaker.
A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation.
We propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords.
Our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude.
arXiv Detail & Related papers (2023-09-29T18:18:32Z) - Improving Empathetic Dialogue Generation by Dynamically Infusing
Commonsense Knowledge [39.536604198392375]
In empathetic conversations, individuals express their empathy towards others.
Previous work has mainly focused on generating empathetic responses by utilizing the speaker's emotion.
We propose a novel approach for empathetic response generation, which incorporates an adaptive module for commonsense knowledge selection.
arXiv Detail & Related papers (2023-05-24T10:25:12Z) - Feedback Effect in User Interaction with Intelligent Assistants: Delayed
Engagement, Adaption and Drop-out [9.205174767678365]
This paper identifies and quantifies the feedback effect, a novel component in IA-user interactions.
We show that unhelpful responses from the IA cause users to delay or reduce subsequent interactions.
As users discover the limitations of the IA's understanding and functional capabilities, they learn to adjust the scope and wording of their requests.
arXiv Detail & Related papers (2023-03-17T21:39:33Z) - Turn-Taking Prediction for Natural Conversational Speech [40.189938418201656]
A common conversational utterance often involves multiple queries with turn-taking.
Disfluencies include pausing to think, hesitations, word lengthening, filled pauses and repeated phrases.
We present a turntaking predictor built on top of the end-to-end (E2E) speech recognizer.
arXiv Detail & Related papers (2022-08-29T01:09:23Z) - Understanding How People Rate Their Conversations [73.17730062864314]
We conduct a study to better understand how people rate their interactions with conversational agents.
We focus on agreeableness and extraversion as variables that may explain variation in ratings.
arXiv Detail & Related papers (2022-06-01T00:45:32Z) - Interacting with Non-Cooperative User: A New Paradigm for Proactive
Dialogue Policy [83.61404191470126]
We propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting.
Specifically, we learn the trade-off via a learned goal weight, which consists of four factors.
The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
arXiv Detail & Related papers (2022-04-07T14:11:31Z) - Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing.
We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z) - Dehumanizing Voice Technology: Phonetic & Experiential Consequences of
Restricted Human-Machine Interaction [0.0]
We show that requests lead to an in-crease in phonetic convergence and lower phonetic latency, and ultimately a more natural task experience for consumers.
We provide evidence that altering the required input to initiate a conversation with smart objects provokes systematic changes both in terms of consumers' subjective experience and objective phonetic changes in the human voice.
arXiv Detail & Related papers (2021-11-02T22:49:25Z) - Improving Factual Consistency Between a Response and Persona Facts [64.30785349238619]
Neural models for response generation produce responses that are semantically plausible but not necessarily factually consistent with facts describing the speaker's persona.
We propose to fine-tune these models by reinforcement learning and an efficient reward function that explicitly captures the consistency between a response and persona facts as well as semantic plausibility.
arXiv Detail & Related papers (2020-04-30T18:08:22Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.