GenSim: Generating Robotic Simulation Tasks via Large Language Models
- URL: http://arxiv.org/abs/2310.01361v2
- Date: Sun, 21 Jan 2024 21:01:12 GMT
- Title: GenSim: Generating Robotic Simulation Tasks via Large Language Models
- Authors: Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao,
Yuzhe Qin, Bailin Wang, Huazhe Xu, Xiaolong Wang
- Abstract summary: GenSim aims to automatically generate rich simulation environments and expert demonstrations.
We use GPT4 to expand the existing benchmark by ten times to over 100 tasks.
With minimal sim-to-real adaptation, multitask policies pretrained on GPT4-generated simulation tasks exhibit stronger transfer to unseen long-horizon tasks in the real world.
- Score: 34.79613485106202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collecting large amounts of real-world interaction data to train general
robotic policies is often prohibitively expensive, thus motivating the use of
simulation data. However, existing methods for data generation have generally
focused on scene-level diversity (e.g., object instances and poses) rather than
task-level diversity, due to the human effort required to come up with and
verify novel tasks. This has made it challenging for policies trained on
simulation data to demonstrate significant task-level generalization. In this
paper, we propose to automatically generate rich simulation environments and
expert demonstrations by exploiting a large language models' (LLM) grounding
and coding ability. Our approach, dubbed GenSim, has two modes: goal-directed
generation, wherein a target task is given to the LLM and the LLM proposes a
task curriculum to solve the target task, and exploratory generation, wherein
the LLM bootstraps from previous tasks and iteratively proposes novel tasks
that would be helpful in solving more complex tasks. We use GPT4 to expand the
existing benchmark by ten times to over 100 tasks, on which we conduct
supervised finetuning and evaluate several LLMs including finetuned GPTs and
Code Llama on code generation for robotic simulation tasks. Furthermore, we
observe that LLMs-generated simulation programs can enhance task-level
generalization significantly when used for multitask policy training. We
further find that with minimal sim-to-real adaptation, the multitask policies
pretrained on GPT4-generated simulation tasks exhibit stronger transfer to
unseen long-horizon tasks in the real world and outperform baselines by 25%.
See the project website (https://liruiw.github.io/gensim) for code, demos, and
videos.
Related papers
- Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation [48.17611255751571]
Post-training is essential for enabling large language models to follow human instructions.
We leverage multi-agent simulation to automatically generate diverse text-based scenarios.
We introduce a novel scenario-driven instruction generator MATRIX-Gen for controllable and highly realistic data synthesis.
arXiv Detail & Related papers (2024-10-18T08:01:39Z) - GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs [38.281562732050084]
GenSim2 is a scalable framework for complex and realistic simulation task creation.
The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts.
We show a promising usage of GenSim2 that the generated data can be used for zero-shot transfer or co-train with real-world collected data.
arXiv Detail & Related papers (2024-10-04T17:51:33Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models.
Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - Generalizable Long-Horizon Manipulations with Large Language Models [91.740084601715]
This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations.
We create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation.
arXiv Detail & Related papers (2023-10-03T17:59:46Z) - Reactive Long Horizon Task Execution via Visual Skill and Precondition
Models [59.76233967614774]
We describe an approach for sim-to-real training that can accomplish unseen robotic tasks using models learned in simulation to ground components of a simple task planner.
We show an increase in success rate from 91.6% to 98% in simulation and from 10% to 80% success rate in the real-world as compared with naive baselines.
arXiv Detail & Related papers (2020-11-17T15:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.