RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation
- URL: http://arxiv.org/abs/2402.14623v1
- Date: Thu, 22 Feb 2024 15:12:00 GMT
- Title: RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation
- Authors: Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng
Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe
Xu, Mingyu Ding, Ping Luo
- Abstract summary: This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
- Score: 77.41969287400977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rapid progress in high-level task planning and code generation for open-world
robot manipulation has been witnessed in Embodied AI. However, previous studies
put much effort into general common sense reasoning and task planning
capabilities of large-scale language or multi-modal models, relatively little
effort on ensuring the deployability of generated code on real robots, and
other fundamental components of autonomous robot systems including robot
perception, motion planning, and control. To bridge this ``ideal-to-real'' gap,
this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot
manipulation pipeline powered by code generation; and 2) a code generation
benchmark for robot manipulation tasks in free-form natural language. The
RobotScript platform addresses this gap by emphasizing the unified interface
with both simulation and real robots, based on abstraction from the Robot
Operating System (ROS), ensuring syntax compliance and simulation validation
with Gazebo. We demonstrate the adaptability of our code generation framework
across multiple robot embodiments, including the Franka and UR5 robot arms, and
multiple grippers. Additionally, our benchmark assesses reasoning abilities for
physical space and constraints, highlighting the differences between GPT-3.5,
GPT-4, and Gemini in handling complex physical interactions. Finally, we
present a thorough evaluation on the whole system, exploring how each module in
the pipeline: code generation, perception, motion planning, and even object
geometric properties, impact the overall performance of the system.
Related papers
- Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX.
RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints.
To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z) - Creative Robot Tool Use with Large Language Models [47.11935262923095]
This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning.
We develop RoboTool, a system that accepts natural language instructions and outputs executable code for controlling robots in both simulated and real-world environments.
arXiv Detail & Related papers (2023-10-19T18:02:15Z) - Prompt a Robot to Walk with Large Language Models [19.89815242751014]
Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains.
We introduce a novel paradigm in which we use few-shot prompts collected from the physical environment.
Experiments across various robots and environments validate that our method can effectively prompt a robot to walk.
arXiv Detail & Related papers (2023-09-18T17:50:17Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - SEAL: Semantic Frame Execution And Localization for Perceiving Afforded
Robot Actions [5.522839151632667]
We extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model.
For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot.
arXiv Detail & Related papers (2023-03-24T15:25:41Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - robo-gym -- An Open Source Toolkit for Distributed Deep Reinforcement
Learning on Real and Simulated Robots [0.5161531917413708]
We propose an open source toolkit: robo-gym to increase the use of Deep Reinforcement Learning with real robots.
We demonstrate a unified setup for simulation and real environments which enables a seamless transfer from training in simulation to application on the robot.
We showcase the capabilities and the effectiveness of the framework with two real world applications featuring industrial robots.
arXiv Detail & Related papers (2020-07-06T13:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.