Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration
- URL: http://arxiv.org/abs/2406.14097v2
- Date: Mon, 1 Jul 2024 06:11:31 GMT
- Title: Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration
- Authors: Haokun Liu, Yaonan Zhu, Kenji Kato, Atsushi Tsukahara, Izumi Kondo, Tadayoshi Aoyama, Yasuhisa Hasegawa,
- Abstract summary: Large Language Models (LLMs) are gaining popularity in the field of robotics.
This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC)
The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot.
- Score: 4.2460673279562755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are gaining popularity in the field of robotics. However, LLM-based robots are limited to simple, repetitive motions due to the poor integration between language models, robots, and the environment. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot. The system also employs a YOLO-based perception algorithm, providing visual cues to the LLM, which aids in planning feasible motions within the specific environment. Additionally, an HRC method is proposed by combining teleoperation and Dynamic Movement Primitives (DMP), allowing the LLM-based robot to learn from human guidance. Real-world experiments have been conducted using the Toyota Human Support Robot for manipulation tasks. The outcomes indicate that tasks requiring complex trajectory planning and reasoning over environments can be efficiently accomplished through the incorporation of human demonstrations.
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation [13.181465089984567]
Large Language Models (LLMs) have been recently used in robot applications for grounding common-sense reasoning with the robot's perception and physical abilities.
In this paper, we address incorporating memory processes with LLMs for generating cross-task robot actions, while the robot effectively switches between tasks.
Our results show a significant improvement in performance over a baseline of five robotic tasks, demonstrating the potential of integrating memory with LLMs for combining the robot's action and perception for adaptive task execution.
arXiv Detail & Related papers (2024-07-18T13:38:21Z) - LIT: Large Language Model Driven Intention Tracking for Proactive Human-Robot Collaboration -- A Robot Sous-Chef Application [4.519544934630495]
Large Language Models (LLM) and Vision Language Models (VLM) enable robots to ground natural language prompts into control actions.
We propose Language-driven Intention Tracking (LIT) to model the human user's long-term behavior and to predict the next human intention to guide the robot for proactive collaboration.
arXiv Detail & Related papers (2024-06-19T19:18:40Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Vision-Language Foundation Models as Effective Robot Imitators [48.73027330407576]
We derive a vision-language manipulation framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo.
By exceeding the state-of-the-art performance with a large margin on the tested benchmark, we show RoboFlamingo can be an effective and competitive alternative to adapt VLMs to robot control.
arXiv Detail & Related papers (2023-11-02T16:34:33Z) - Prompt a Robot to Walk with Large Language Models [18.214609570837403]
Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains.
We introduce a novel paradigm in which we use few-shot prompts collected from the physical environment.
Experiments across various robots and environments validate that our method can effectively prompt a robot to walk.
arXiv Detail & Related papers (2023-09-18T17:50:17Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks [4.4589894340260585]
This paper presents a novel approach to enhance autonomous robotic manipulation using the Large Language Model (LLM) for logical inference.
The proposed system combines the advantage of LLM with YOLO-based environmental perception to enable robots to autonomously make reasonable decisions.
arXiv Detail & Related papers (2023-08-29T01:54:49Z) - Language to Rewards for Robotic Skill Synthesis [37.21434094015743]
We introduce a new paradigm that harnesses large language models (LLMs) to define reward parameters that can be optimized and accomplish variety of robotic tasks.
Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.
arXiv Detail & Related papers (2023-06-14T17:27:10Z) - Show Me What You Can Do: Capability Calibration on Reachable Workspace
for Human-Robot Collaboration [83.4081612443128]
We show that a short calibration using REMP can effectively bridge the gap between what a non-expert user thinks a robot can reach and the ground-truth.
We show that this calibration procedure not only results in better user perception, but also promotes more efficient human-robot collaborations.
arXiv Detail & Related papers (2021-03-06T09:14:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.