Analysis and Detection of Differences in Spoken User Behaviors between Autonomous and Wizard-of-Oz Systems
- URL: http://arxiv.org/abs/2410.03147v1
- Date: Fri, 4 Oct 2024 05:07:55 GMT
- Title: Analysis and Detection of Differences in Spoken User Behaviors between Autonomous and Wizard-of-Oz Systems
- Authors: Mikey Elmers, Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara,
- Abstract summary: We analyzed user spoken behaviors in both attentive listening and job interview dialogue scenarios.
Results revealed significant differences in metrics such as speech length, speaking rate, fillers, backchannels, disfluencies, and laughter.
We developed predictive models to distinguish between operator and autonomous system conditions.
- Score: 21.938414385824903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study examined users' behavioral differences in a large corpus of Japanese human-robot interactions, comparing interactions between a tele-operated robot and an autonomous dialogue system. We analyzed user spoken behaviors in both attentive listening and job interview dialogue scenarios. Results revealed significant differences in metrics such as speech length, speaking rate, fillers, backchannels, disfluencies, and laughter between operator-controlled and autonomous conditions. Furthermore, we developed predictive models to distinguish between operator and autonomous system conditions. Our models demonstrated higher accuracy and precision compared to the baseline model, with several models also achieving a higher F1 score than the baseline.
Related papers
- Does the Appearance of Autonomous Conversational Robots Affect User Spoken Behaviors in Real-World Conference Interactions? [19.873188667424024]
We compare a human-like android, ERICA, with a less anthropomorphic humanoid, TELECO.
The results show that participants produced fewer disfluencies and employed more complex syntax when interacting with ERICA.
We conclude that designing robots to elicit more structured and fluent user speech can enhance their communicative alignment with humans.
arXiv Detail & Related papers (2025-03-17T18:20:30Z) - A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment [18.814181652728486]
We propose a noise-robust voice activity projection model to enhance real-time turn-taking in dialogue robots.
We conducted a field experiment in a shopping mall, comparing the VAP system with a conventional cloud-based speech recognition system.
The results showed that the proposed system significantly reduced response latency, leading to a more natural conversation.
arXiv Detail & Related papers (2025-03-08T14:53:20Z) - Applying General Turn-taking Models to Conversational Human-Robot Interaction [3.8673630752805446]
This paper investigates the application of general turn-taking models, specifically TurnGPT and Voice Activity Projection (VAP), to improve conversational dynamics in HRI.
We propose methods for using these models in tandem to predict when a robot should begin preparing responses, take turns, and handle potential interruptions.
arXiv Detail & Related papers (2025-01-15T16:49:22Z) - A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation [39.87346821309096]
We present an addressee estimation model with improved performance in comparison with the previous SOTA.
We also propose several ways to incorporate explainability and transparency in the aforementioned architecture.
arXiv Detail & Related papers (2024-05-20T13:09:32Z) - Unsupervised Auditory and Semantic Entrainment Models with Deep Neural
Networks [0.3222802562733786]
We present an unsupervised deep learning framework that derives meaningful representation from textual features for developing semantic entrainment.
The results show that semantic entrainment can be assessed with our model, that models can distinguish between HH and HM interactions and that the two units of analysis for extracting acoustic features provide comparable findings.
arXiv Detail & Related papers (2023-12-22T22:33:54Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - A Graph-to-Text Approach to Knowledge-Grounded Response Generation in
Human-Robot Interaction [2.3590037806133024]
This paper presents a novel conversational model for human--robot interaction that rests upon a graph-based representation of the dialogue state.
The neural conversational model employed to respond to user utterances relies on a simple but effective graph-to-text mechanism.
The proposed approach is empirically evaluated through a user study with a humanoid robot.
arXiv Detail & Related papers (2023-11-03T15:44:28Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - I like fish, especially dolphins: Addressing Contradictions in Dialogue
Modeling [104.09033240889106]
We introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues.
We then compare a structured utterance-based approach of using pre-trained Transformer models for contradiction detection with the typical unstructured approach.
arXiv Detail & Related papers (2020-12-24T18:47:49Z) - Open-Ended Multi-Modal Relational Reasoning for Video Question Answering [1.8699569122464073]
The primary focus of this agent is to assist individuals using language-based interactions within video-based scenes.
Our proposed method integrates video recognition technology and natural language processing models within the robotic agent.
arXiv Detail & Related papers (2020-12-01T20:49:59Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Collaborative Motion Prediction via Neural Motion Message Passing [37.72454920355321]
We propose neural motion message passing (NMMP) to explicitly model the interaction and learn representations for directed interactions between actors.
Based on the proposed NMMP, we design the motion prediction systems for two settings: the pedestrian setting and the joint pedestrian and vehicle setting.
Both systems outperform the previous state-of-the-art methods on several existing benchmarks.
arXiv Detail & Related papers (2020-03-14T10:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.