Towards Understanding Confusion and Affective States Under Communication
Failures in Voice-Based Human-Machine Interaction
- URL: http://arxiv.org/abs/2207.07693v1
- Date: Fri, 15 Jul 2022 18:32:48 GMT
- Title: Towards Understanding Confusion and Affective States Under Communication
Failures in Voice-Based Human-Machine Interaction
- Authors: Sujeong Kim, Abhinav Garlapati, Jonah Lubin, Amir Tamrakar, Ajay
Divakaran
- Abstract summary: We present a series of two studies conducted to understand user's affective states during voice-based human-machine interactions.
The studies consist of two types of tasks: (1) related to communication with a voice-based virtual agent: speaking to the machine and understanding what the machine says, (2) non-communication related, problem-solving tasks where the participants solve puzzles and riddles but are asked to verbally explain the answers to the machine.
- Score: 8.602681427083553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a series of two studies conducted to understand user's affective
states during voice-based human-machine interactions. Emphasis is placed on the
cases of communication errors or failures. In particular, we are interested in
understanding "confusion" in relation with other affective states. The studies
consist of two types of tasks: (1) related to communication with a voice-based
virtual agent: speaking to the machine and understanding what the machine says,
(2) non-communication related, problem-solving tasks where the participants
solve puzzles and riddles but are asked to verbally explain the answers to the
machine. We collected audio-visual data and self-reports of affective states of
the participants. We report results of two studies and analysis of the
collected data. The first study was analyzed based on the annotator's
observation, and the second study was analyzed based on the self-report.
Related papers
- Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort [0.0]
This study investigates the efficacy of using multimodal machine learning techniques to detect deception in dyadic interactions.<n>We compare early and late fusion approaches, utilizing audio and video data - specifically, Action Units and gaze information.<n>The results demonstrate that incorporating both speech and facial information yields superior performance compared to single-modality approaches.
arXiv Detail & Related papers (2025-06-26T16:11:42Z) - Analysing Explanation-Related Interactions in Collaborative Perception-Cognition-Communication-Action [1.33828830691279]
We analyse and classify communications among human participants collaborating to complete a simulated emergency response task.
We find that most explanation-related messages seek clarification in the decisions or actions taken.
arXiv Detail & Related papers (2024-11-19T13:07:04Z) - A Novel Labeled Human Voice Signal Dataset for Misbehavior Detection [0.7223352886780369]
This research highlights the significance of voice tone and delivery in automated machine-learning systems for voice analysis and recognition.
It contributes to the broader field of voice signal analysis by elucidating the impact of human behaviour on the perception and categorization of voice signals.
arXiv Detail & Related papers (2024-06-28T18:55:07Z) - The timing bottleneck: Why timing and overlap are mission-critical for
conversational user interfaces, speech recognition and dialogue systems [0.11470070927586018]
We evaluate 5 major commercial ASR systems for their conversational and multilingual support.
We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge.
Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.
arXiv Detail & Related papers (2023-07-28T11:38:05Z) - DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning [89.92601337474954]
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations.
We introduce a novel challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic reasoning and situated conversational understanding.
arXiv Detail & Related papers (2023-06-15T10:41:23Z) - Expanding the Role of Affective Phenomena in Multimodal Interaction
Research [57.069159905961214]
We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing.
We identify 910 affect-related papers and present our analysis of the role of affective phenomena in these papers.
We find limited research on how affect and emotion predictions might be used by AI systems to enhance machine understanding of human social behaviors and cognitive states.
arXiv Detail & Related papers (2023-05-18T09:08:39Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Question-Interlocutor Scope Realized Graph Modeling over Key Utterances
for Dialogue Reading Comprehension [61.55950233402972]
We propose a new key utterances extracting method for dialogue reading comprehension.
It performs prediction on the unit formed by several contiguous utterances, which can realize more answer-contained utterances.
As a graph constructed on the text of utterances, we then propose Question-Interlocutor Scope Realized Graph (QuISG) modeling.
arXiv Detail & Related papers (2022-10-26T04:00:42Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.