Related papers: Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction

Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction

URL: http://arxiv.org/abs/2207.07693v1
Date: Fri, 15 Jul 2022 18:32:48 GMT
Title: Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction
Authors: Sujeong Kim, Abhinav Garlapati, Jonah Lubin, Amir Tamrakar, Ajay Divakaran
Abstract summary: We present a series of two studies conducted to understand user's affective states during voice-based human-machine interactions. The studies consist of two types of tasks: (1) related to communication with a voice-based virtual agent: speaking to the machine and understanding what the machine says, (2) non-communication related, problem-solving tasks where the participants solve puzzles and riddles but are asked to verbally explain the answers to the machine.
Score: 8.602681427083553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a series of two studies conducted to understand user's affective states during voice-based human-machine interactions. Emphasis is placed on the cases of communication errors or failures. In particular, we are interested in understanding "confusion" in relation with other affective states. The studies consist of two types of tasks: (1) related to communication with a voice-based virtual agent: speaking to the machine and understanding what the machine says, (2) non-communication related, problem-solving tasks where the participants solve puzzles and riddles but are asked to verbally explain the answers to the machine. We collected audio-visual data and self-reports of affective states of the participants. We report results of two studies and analysis of the collected data. The first study was analyzed based on the annotator's observation, and the second study was analyzed based on the self-report.

Related papers

Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort [0.0]
This study investigates the efficacy of using multimodal machine learning techniques to detect deception in dyadic interactions.<n>We compare early and late fusion approaches, utilizing audio and video data - specifically, Action Units and gaze information.<n>The results demonstrate that incorporating both speech and facial information yields superior performance compared to single-modality approaches.
arXiv Detail & Related papers (2025-06-26T16:11:42Z)
Analysing Explanation-Related Interactions in Collaborative Perception-Cognition-Communication-Action [1.33828830691279]
We analyse and classify communications among human participants collaborating to complete a simulated emergency response task. We find that most explanation-related messages seek clarification in the decisions or actions taken.
arXiv Detail & Related papers (2024-11-19T13:07:04Z)
A Novel Labeled Human Voice Signal Dataset for Misbehavior Detection [0.7223352886780369]
This research highlights the significance of voice tone and delivery in automated machine-learning systems for voice analysis and recognition. It contributes to the broader field of voice signal analysis by elucidating the impact of human behaviour on the perception and categorization of voice signals.
arXiv Detail & Related papers (2024-06-28T18:55:07Z)
The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems [0.11470070927586018]
We evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge. Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.
arXiv Detail & Related papers (2023-07-28T11:38:05Z)
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning [89.92601337474954]
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations. We introduce a novel challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic reasoning and situated conversational understanding.
arXiv Detail & Related papers (2023-06-15T10:41:23Z)
Expanding the Role of Affective Phenomena in Multimodal Interaction Research [57.069159905961214]
We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing. We identify 910 affect-related papers and present our analysis of the role of affective phenomena in these papers. We find limited research on how affect and emotion predictions might be used by AI systems to enhance machine understanding of human social behaviors and cognitive states.
arXiv Detail & Related papers (2023-05-18T09:08:39Z)
deep learning of segment-level feature representation for speech emotion recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions. First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances. Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z)
Question-Interlocutor Scope Realized Graph Modeling over Key Utterances for Dialogue Reading Comprehension [61.55950233402972]
We propose a new key utterances extracting method for dialogue reading comprehension. It performs prediction on the unit formed by several contiguous utterances, which can realize more answer-contained utterances. As a graph constructed on the text of utterances, we then propose Question-Interlocutor Scope Realized Graph (QuISG) modeling.
arXiv Detail & Related papers (2022-10-26T04:00:42Z)
Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models. We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z)
Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks. We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.