Related papers: Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru

Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru

URL: http://arxiv.org/abs/2402.11571v1
Date: Sun, 18 Feb 2024 12:35:52 GMT
Title: Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru
Authors: Zining Wang and Paul Reisert and Eric Nichols and Randy Gomez
Abstract summary: We introduce a fully-automated conversation system that leverages large language models (LLMs) to generate robot responses with expressive behaviors. We conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts. Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations.
Score: 9.2526849536751
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Social robots aim to establish long-term bonds with humans through engaging conversation. However, traditional conversational approaches, reliant on scripted interactions, often fall short in maintaining engaging conversations. This paper addresses this limitation by integrating large language models (LLMs) into social robots to achieve more dynamic and expressive conversations. We introduce a fully-automated conversation system that leverages LLMs to generate robot responses with expressive behaviors, congruent with the robot's personality. We incorporate robot behavior with two modalities: 1) a text-to-speech (TTS) engine capable of various delivery styles, and 2) a library of physical actions for the robot. We develop a custom, state-of-the-art emotion recognition model to dynamically select the robot's tone of voice and utilize emojis from LLM output as cues for generating robot actions. A demo of our system is available here. To illuminate design and implementation issues, we conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts. Feedback was overwhelmingly positive, with participants commenting on the robot's empathy, helpfulness, naturalness, and entertainment. Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations. However, we observed a small class of errors, such as the LLM repeating itself or hallucinating fictitious information and human responses, that have the potential to derail conversations, raising important issues for LLM application.

Related papers

RoboOmni: Proactive Robot Manipulation in Omni-modal Context [165.09049429566238]
We introduce cross-modal contextual instructions, a new setting where intent is derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands.<n>We present RoboOmni, a framework based on end-to-end omni-modal LLMs that unifies intention recognition, interaction confirmation, and action execution.<n>Experiments in simulation and real-world settings show that Robo Omni surpasses text- and ASR-based baselines in success rate, inference speed, intention recognition, and proactive assistance.
arXiv Detail & Related papers (2025-10-27T18:49:03Z)
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations [15.140345369639215]
ERR@HRI 2.0 Challenge provides a dataset of conversational robot failures during human-robot conversations.<n> dataset includes 16 hours of dyadic human-robot interactions, incorporating facial, speech, and head movement features.<n>Participants are invited to form teams and develop machine learning models that detect these failures using multimodal data.
arXiv Detail & Related papers (2025-07-17T18:21:45Z)
Towards an LLM-Based Speech Interface for Robot-Assisted Feeding [9.528060348251584]
Speech interfaces that utilize Large Language Models (LLMs) can enable individuals to communicate high-level commands and nuanced preferences to robots. In this work, we demonstrate an LLM-based speech interface for a commercially available assistive feeding robot.
arXiv Detail & Related papers (2024-10-27T22:56:51Z)
LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction. Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z)
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z)
VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots [9.528060348251584]
Speech interfaces that utilize Large Language Models (LLMs) can enable individuals to communicate high-level commands and nuanced preferences to robots. Frameworks for integrating LLMs as interfaces to robots for high level task planning and code generation have been proposed, but fail to incorporate human-centric considerations. We present a framework for incorporating LLMs as speech interfaces for physically assistive robots, constructed iteratively with 3 stages of testing involving a feeding robot, culminating in an evaluation with 11 older adults at an independent living facility.
arXiv Detail & Related papers (2024-04-05T12:45:10Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models [23.945922720555146]
We propose a system to achieve incremental learning of complex behavior from natural interaction. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6.
arXiv Detail & Related papers (2023-09-08T13:29:05Z)
Developing Social Robots with Empathetic Non-Verbal Cues Using Large Language Models [2.5489046505746704]
We design and label four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot. Preliminary results show distinct patterns in the robot's responses, such as a preference for calm and positive social emotions like 'joy' and 'lively', and frequent nodding gestures. Our work lays the groundwork for future studies on human-robot interactions, emphasizing the essential role of both verbal and non-verbal cues in creating social and empathetic robots.
arXiv Detail & Related papers (2023-08-31T08:20:04Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions. Our framework included a guiding robot and an interlocutor model that plays the role of humans. We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z)
Self-supervised reinforcement learning for speaker localisation with the iCub humanoid robot [58.2026611111328]
Looking at a person's face is one of the mechanisms that humans rely on when it comes to filtering speech in noisy environments. Having a robot that can look toward a speaker could benefit ASR performance in challenging environments. We propose a self-supervised reinforcement learning-based framework inspired by the early development of humans.
arXiv Detail & Related papers (2020-11-12T18:02:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.