Towards Objective Evaluation of Socially-Situated Conversational Robots:
Assessing Human-Likeness through Multimodal User Behaviors
- URL: http://arxiv.org/abs/2308.11020v2
- Date: Mon, 25 Sep 2023 12:10:30 GMT
- Title: Towards Objective Evaluation of Socially-Situated Conversational Robots:
Assessing Human-Likeness through Multimodal User Behaviors
- Authors: Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze
- Abstract summary: This paper focuses on assessing the human-likeness of the robot as the primary evaluation metric.
Our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and objectivity.
- Score: 26.003947740875482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper tackles the challenging task of evaluating socially situated
conversational robots and presents a novel objective evaluation approach that
relies on multimodal user behaviors. In this study, our main focus is on
assessing the human-likeness of the robot as the primary evaluation metric.
While previous research often relied on subjective evaluations from users, our
approach aims to evaluate the robot's human-likeness based on observable user
behaviors indirectly, thus enhancing objectivity and reproducibility. To begin,
we created an annotated dataset of human-likeness scores, utilizing user
behaviors found in an attentive listening dialogue corpus. We then conducted an
analysis to determine the correlation between multimodal user behaviors and
human-likeness scores, demonstrating the feasibility of our proposed
behavior-based evaluation method.
Related papers
- ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking.
We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert.
We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z) - A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation [39.87346821309096]
We present an addressee estimation model with improved performance in comparison with the previous SOTA.
We also propose several ways to incorporate explainability and transparency in the aforementioned architecture.
arXiv Detail & Related papers (2024-05-20T13:09:32Z) - Beyond Static Evaluation: A Dynamic Approach to Assessing AI Assistants' API Invocation Capabilities [48.922660354417204]
We propose Automated Dynamic Evaluation (AutoDE) to assess an assistant's API call capability without human involvement.
In our framework, we endeavor to closely mirror genuine human conversation patterns in human-machine interactions.
arXiv Detail & Related papers (2024-03-17T07:34:12Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Robust Robot Planning for Human-Robot Collaboration [11.609195090422514]
In human-robot collaboration, the objectives of the human are often unknown to the robot.
We propose an approach to automatically generate an uncertain human behavior (a policy) for each given objective function.
We also propose a robot planning algorithm that is robust to the above-mentioned uncertainties.
arXiv Detail & Related papers (2023-02-27T16:02:48Z) - Gaze-based intention estimation: principles, methodologies, and
applications in HRI [0.0]
This review aims to draw a line between insights in the psychological literature on visuomotor control and relevant applications of gaze-based intention recognition.
The use of eye tracking and gaze-based models for intent recognition in Human-Robot Interaction is considered.
arXiv Detail & Related papers (2023-02-09T09:44:13Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.