Related papers: ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations

ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations

URL: http://arxiv.org/abs/2507.13468v1
Date: Thu, 17 Jul 2025 18:21:45 GMT
Title: ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations
Authors: Shiye Cao, Maia Stiber, Amama Mahmood, Maria Teresa Parreira, Wendy Ju, Micol Spitale, Hatice Gunes, Chien-Ming Huang,
Abstract summary: ERR@HRI 2.0 Challenge provides a dataset of conversational robot failures during human-robot conversations.<n> dataset includes 16 hours of dyadic human-robot interactions, incorporating facial, speech, and head movement features.<n>Participants are invited to form teams and develop machine learning models that detect these failures using multimodal data.
Score: 15.140345369639215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of large language models (LLMs) into conversational robots has made human-robot conversations more dynamic. Yet, LLM-powered conversational robots remain prone to errors, e.g., misunderstanding user intent, prematurely interrupting users, or failing to respond altogether. Detecting and addressing these failures is critical for preventing conversational breakdowns, avoiding task disruptions, and sustaining user trust. To tackle this problem, the ERR@HRI 2.0 Challenge provides a multimodal dataset of LLM-powered conversational robot failures during human-robot conversations and encourages researchers to benchmark machine learning models designed to detect robot failures. The dataset includes 16 hours of dyadic human-robot interactions, incorporating facial, speech, and head movement features. Each interaction is annotated with the presence or absence of robot errors from the system perspective, and perceived user intention to correct for a mismatch between robot behavior and user expectation. Participants are invited to form teams and develop machine learning models that detect these failures using multimodal data. Submissions will be evaluated using various performance metrics, including detection accuracy and false positive rate. This challenge represents another key step toward improving failure detection in human-robot interaction through social signal analysis.

Related papers

Why Robots Are Bad at Detecting Their Mistakes: Limitations of Miscommunication Detection in Human-Robot Dialogue [0.6118899177909359]
This research evaluates the effectiveness of machine learning models in detecting miscommunications in robot dialogue.<n>After each conversational turn, users provided feedback on whether they perceived an error, enabling an analysis of the models' ability to accurately detect robot mistakes.
arXiv Detail & Related papers (2025-06-25T09:25:04Z)
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z)
Human-Robot Interaction and Perceived Irrationality: A Study of Trust Dynamics and Error Acknowledgment [0.0]
This study systematically examines trust dynamics and system design by analyzing human reactions to robot failures.<n>We conducted a four-stage survey to explore how trust evolves throughout human-robot interactions.<n>Results indicate that trust in robotic systems significantly increased when robots acknowledged their errors or limitations.
arXiv Detail & Related papers (2024-03-21T11:00:11Z)
ROS-Causal: A ROS-based Causal Analysis Framework for Human-Robot Interaction Applications [3.8625803348911774]
This paper introduces ROS-Causal, a framework for causal discovery in human-robot spatial interactions. An ad-hoc simulator, integrated with ROS, illustrates the approach's effectiveness.
arXiv Detail & Related papers (2024-02-25T11:37:23Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru [9.2526849536751]
We introduce a fully-automated conversation system that leverages large language models (LLMs) to generate robot responses with expressive behaviors. We conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts. Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations.
arXiv Detail & Related papers (2024-02-18T12:35:52Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans. Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot. The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z)
Continuous ErrP detections during multimodal human-robot interaction [2.5199066832791535]
We implement a multimodal human-robot interaction (HRI) scenario, in which a simulated robot communicates with its human partner through speech and gestures. The human partner, in turn, evaluates whether the robot's verbal announcement (intention) matches the action (pointing gesture) chosen by the robot. In intrinsic evaluations of robot actions by humans, evident in the EEG, were recorded in real time, continuously segmented online and classified asynchronously.
arXiv Detail & Related papers (2022-07-25T15:39:32Z)
Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process. We propose to generate smooth motions via an efficient model-predictive control framework. We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z)
Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs [90.20235972293801]
Aiming to understand how human (false-temporal)-belief-a core socio-cognitive ability unify-would affect human interactions with robots, this paper proposes to adopt a graphical model to the representation of object states, robot knowledge, and human (false-)beliefs. An inference algorithm is derived to fuse individual pg from all robots across multi-views into a joint pg, which affords more effective reasoning inference capability to overcome the errors originated from a single view.
arXiv Detail & Related papers (2020-04-25T23:02:04Z)
An Attention Transfer Model for Human-Assisted Failure Avoidance in Robot Manipulations [2.745883395089022]
A novel human-to-robot attention transfer (textittextbfH2R-AT) method was developed to identify robot manipulation errors. textittextbfH2R-AT was developed by fusing attention mapping mechanism into a novel stacked neural networks model. The method effectiveness was validated by the high accuracy of $73.68%$ in transferring attention, and the high accuracy of $66.86%$ in avoiding grasping failures.
arXiv Detail & Related papers (2020-02-11T07:58:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.