Related papers: Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors

Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors

URL: http://arxiv.org/abs/2308.11020v2
Date: Mon, 25 Sep 2023 12:10:30 GMT
Title: Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors
Authors: Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze
Abstract summary: This paper focuses on assessing the human-likeness of the robot as the primary evaluation metric. Our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and objectivity.
Score: 26.003947740875482
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors. In this study, our main focus is on assessing the human-likeness of the robot as the primary evaluation metric. While previous research often relied on subjective evaluations from users, our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and reproducibility. To begin, we created an annotated dataset of human-likeness scores, utilizing user behaviors found in an attentive listening dialogue corpus. We then conducted an analysis to determine the correlation between multimodal user behaviors and human-likeness scores, demonstrating the feasibility of our proposed behavior-based evaluation method.

Related papers

Designing for Difference: How Human Characteristics Shape Perceptions of Collaborative Robots [8.095309110037315]
Antisocial robot behavior was consistently rated as the lowest, while collaboration with aged individuals elicited more sensitive evaluations.<n>These findings suggest that both human characteristics and interaction paradigms influence the perceived acceptability of collaborative robots.
arXiv Detail & Related papers (2025-07-22T11:36:08Z)
Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning [0.47355466227925036]
Explanation is a fundamentally human process. Understanding the goal and audience of the explanation is vital. Existing work on explainable reinforcement learning (XRL) routinely does not consult humans in their evaluations. This paper calls on researchers to use objective human metrics for explanation evaluations based on observable and actionable behaviour.
arXiv Detail & Related papers (2025-01-31T16:12:23Z)
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge [51.93909886542317]
We show how a single aggregate correlation score can obscure differences between human behavior and automatic evaluation methods. We propose stratifying results by human label uncertainty to provide a more robust analysis of automatic evaluation performance.
arXiv Detail & Related papers (2024-10-03T03:08:29Z)
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking. We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert. We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z)
A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation [39.87346821309096]
We present an addressee estimation model with improved performance in comparison with the previous SOTA. We also propose several ways to incorporate explainability and transparency in the aforementioned architecture.
arXiv Detail & Related papers (2024-05-20T13:09:32Z)
Towards interactive evaluations for interaction harms in human-AI systems [8.989911701384788]
We argue for a paradigm shift toward evaluation centered on. textitinteractional ethics. We propose principles for evaluating generative models through interaction scenarios and human impact metrics.
arXiv Detail & Related papers (2024-05-17T08:49:34Z)
Beyond Static Evaluation: A Dynamic Approach to Assessing AI Assistants' API Invocation Capabilities [48.922660354417204]
We propose Automated Dynamic Evaluation (AutoDE) to assess an assistant's API call capability without human involvement. In our framework, we endeavor to closely mirror genuine human conversation patterns in human-machine interactions.
arXiv Detail & Related papers (2024-03-17T07:34:12Z)
Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans. Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot. The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z)
Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Robust Robot Planning for Human-Robot Collaboration [11.609195090422514]
In human-robot collaboration, the objectives of the human are often unknown to the robot. We propose an approach to automatically generate an uncertain human behavior (a policy) for each given objective function. We also propose a robot planning algorithm that is robust to the above-mentioned uncertainties.
arXiv Detail & Related papers (2023-02-27T16:02:48Z)
Gaze-based intention estimation: principles, methodologies, and applications in HRI [0.0]
This review aims to draw a line between insights in the psychological literature on visuomotor control and relevant applications of gaze-based intention recognition. The use of eye tracking and gaze-based models for intent recognition in Human-Robot Interaction is considered.
arXiv Detail & Related papers (2023-02-09T09:44:13Z)
Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation. Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.