Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in
Conversations with the Tabletop Robot Haru
- URL: http://arxiv.org/abs/2402.11571v1
- Date: Sun, 18 Feb 2024 12:35:52 GMT
- Title: Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in
Conversations with the Tabletop Robot Haru
- Authors: Zining Wang and Paul Reisert and Eric Nichols and Randy Gomez
- Abstract summary: We introduce a fully-automated conversation system that leverages large language models (LLMs) to generate robot responses with expressive behaviors.
We conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts.
Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations.
- Score: 9.2526849536751
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social robots aim to establish long-term bonds with humans through engaging
conversation. However, traditional conversational approaches, reliant on
scripted interactions, often fall short in maintaining engaging conversations.
This paper addresses this limitation by integrating large language models
(LLMs) into social robots to achieve more dynamic and expressive conversations.
We introduce a fully-automated conversation system that leverages LLMs to
generate robot responses with expressive behaviors, congruent with the robot's
personality. We incorporate robot behavior with two modalities: 1) a
text-to-speech (TTS) engine capable of various delivery styles, and 2) a
library of physical actions for the robot. We develop a custom,
state-of-the-art emotion recognition model to dynamically select the robot's
tone of voice and utilize emojis from LLM output as cues for generating robot
actions. A demo of our system is available here. To illuminate design and
implementation issues, we conduct a pilot study where volunteers chat with a
social robot using our proposed system, and we analyze their feedback,
conducting a rigorous error analysis of chat transcripts. Feedback was
overwhelmingly positive, with participants commenting on the robot's empathy,
helpfulness, naturalness, and entertainment. Most negative feedback was due to
automatic speech recognition (ASR) errors which had limited impact on
conversations. However, we observed a small class of errors, such as the LLM
repeating itself or hallucinating fictitious information and human responses,
that have the potential to derail conversations, raising important issues for
LLM application.
Related papers
- Towards an LLM-Based Speech Interface for Robot-Assisted Feeding [9.528060348251584]
Speech interfaces that utilize Large Language Models (LLMs) can enable individuals to communicate high-level commands and nuanced preferences to robots.
In this work, we demonstrate an LLM-based speech interface for a commercially available assistive feeding robot.
arXiv Detail & Related papers (2024-10-27T22:56:51Z) - LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction.
Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z) - Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots [9.528060348251584]
Speech interfaces that utilize Large Language Models (LLMs) can enable individuals to communicate high-level commands and nuanced preferences to robots.
Frameworks for integrating LLMs as interfaces to robots for high level task planning and code generation have been proposed, but fail to incorporate human-centric considerations.
We present a framework for incorporating LLMs as speech interfaces for physically assistive robots, constructed iteratively with 3 stages of testing involving a feeding robot, culminating in an evaluation with 11 older adults at an independent living facility.
arXiv Detail & Related papers (2024-04-05T12:45:10Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models [23.945922720555146]
We propose a system to achieve incremental learning of complex behavior from natural interaction.
We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6.
arXiv Detail & Related papers (2023-09-08T13:29:05Z) - Developing Social Robots with Empathetic Non-Verbal Cues Using Large
Language Models [2.5489046505746704]
We design and label four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot.
Preliminary results show distinct patterns in the robot's responses, such as a preference for calm and positive social emotions like 'joy' and 'lively', and frequent nodding gestures.
Our work lays the groundwork for future studies on human-robot interactions, emphasizing the essential role of both verbal and non-verbal cues in creating social and empathetic robots.
arXiv Detail & Related papers (2023-08-31T08:20:04Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Self-supervised reinforcement learning for speaker localisation with the
iCub humanoid robot [58.2026611111328]
Looking at a person's face is one of the mechanisms that humans rely on when it comes to filtering speech in noisy environments.
Having a robot that can look toward a speaker could benefit ASR performance in challenging environments.
We propose a self-supervised reinforcement learning-based framework inspired by the early development of humans.
arXiv Detail & Related papers (2020-11-12T18:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.