Dobby: A Conversational Service Robot Driven by GPT-4
- URL: http://arxiv.org/abs/2310.06303v1
- Date: Tue, 10 Oct 2023 04:34:00 GMT
- Title: Dobby: A Conversational Service Robot Driven by GPT-4
- Authors: Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis
Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, and Justin Hart
- Abstract summary: This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for service tasks.
The agent is derived from a large language model, which has learned from a vast corpus of general knowledge.
In addition to generating dialogue, this agent can interface with the physical world by invoking commands on the robot.
- Score: 22.701223191699412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work introduces a robotics platform which embeds a conversational AI
agent in an embodied system for natural language understanding and intelligent
decision-making for service tasks; integrating task planning and human-like
conversation. The agent is derived from a large language model, which has
learned from a vast corpus of general knowledge. In addition to generating
dialogue, this agent can interface with the physical world by invoking commands
on the robot; seamlessly merging communication and behavior. This system is
demonstrated in a free-form tour-guide scenario, in an HRI study combining
robots with and without conversational AI capabilities. Performance is measured
along five dimensions: overall effectiveness, exploration abilities,
scrutinization abilities, receptiveness to personification, and adaptability.
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Simulating User Agents for Embodied Conversational-AI [9.402740034754455]
We build a large language model (LLM)-based user agent that can simulate user behavior during interactions with an embodied agent.
We evaluate our user agent's ability to generate human-like behaviors by comparing its simulated dialogues with the TEACh dataset.
arXiv Detail & Related papers (2024-10-31T00:56:08Z) - Interpreting and learning voice commands with a Large Language Model for a robot system [0.0]
The use of Large Language Models (LLMs) like GPT-4 has enhanced robot capabilities, allowing for real-time interaction and decision-making.
This project focuses on merging LLMs with databases to improve decision-making and enable knowledge acquisition for request interpretation problems.
arXiv Detail & Related papers (2024-07-31T10:30:31Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Exploring Large Language Models to Facilitate Variable Autonomy for Human-Robot Teaming [4.779196219827508]
We introduce a novel framework for a GPT-powered multi-robot testbed environment, based on a Unity Virtual Reality (VR) setting.
This system allows users to interact with robot agents through natural language, each powered by individual GPT cores.
A user study with 12 participants explores the effectiveness of GPT-4 and, more importantly, user strategies when being given the opportunity to converse in natural language within a multi-robot environment.
arXiv Detail & Related papers (2023-12-12T12:26:48Z) - A Sign Language Recognition System with Pepper, Lightweight-Transformer,
and LLM [0.9775599530257609]
This research explores using lightweight deep neural network architectures to enable the humanoid robot Pepper to understand American Sign Language (ASL)
We introduce a lightweight and efficient model for ASL understanding optimized for embedded systems, ensuring rapid sign recognition while conserving computational resources.
We tailor interactions to allow the Pepper Robot to generate natural Co-Speech Gesture responses, laying the foundation for more organic and intuitive humanoid-robot dialogues.
arXiv Detail & Related papers (2023-09-28T23:54:41Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Surfer: Progressive Reasoning with World Models for Robotic Manipulation [51.26109827779267]
We introduce a novel and simple robot manipulation framework, called Surfer.
Surfer treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene.
It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene.
arXiv Detail & Related papers (2023-06-20T07:06:04Z) - Understanding Natural Language in Context [13.112390442564442]
We focus on cognitive robots, which have some knowledge-based models of the world and operate by reasoning and planning with this model.
Our goal in this research is to translate natural language utterances into this robot's formalism.
We do so by combining off-the-shelf SOTA language models, planning tools, and the robot's knowledge-base for better communication.
arXiv Detail & Related papers (2022-05-25T11:52:16Z) - Self-supervised reinforcement learning for speaker localisation with the
iCub humanoid robot [58.2026611111328]
Looking at a person's face is one of the mechanisms that humans rely on when it comes to filtering speech in noisy environments.
Having a robot that can look toward a speaker could benefit ASR performance in challenging environments.
We propose a self-supervised reinforcement learning-based framework inspired by the early development of humans.
arXiv Detail & Related papers (2020-11-12T18:02:15Z) - Joint Mind Modeling for Explanation Generation in Complex Human-Robot
Collaborative Tasks [83.37025218216888]
We propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations.
The robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications.
Results show that the generated explanations of our approach significantly improves the collaboration performance and user perception of the robot.
arXiv Detail & Related papers (2020-07-24T23:35:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.