JUICER: Data-Efficient Imitation Learning for Robotic Assembly
- URL: http://arxiv.org/abs/2404.03729v3
- Date: Mon, 11 Nov 2024 14:09:00 GMT
- Title: JUICER: Data-Efficient Imitation Learning for Robotic Assembly
- Authors: Lars Ankile, Anthony Simeonov, Idan Shenfeld, Pulkit Agrawal,
- Abstract summary: This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget.
Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation.
We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps.
- Score: 21.43402768760014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisely grasping, reorienting, and inserting multiple parts over long horizons and multiple task phases. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. These help expand dataset support and supervise the model with locally corrective actions near bottleneck regions requiring high precision. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps directly from RGB images, outperforming imitation and data augmentation baselines. Project website: https://imitation-juicer.github.io/.
Related papers
- λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics [11.901933884058021]
We introduce the LAMBDA benchmark-Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities.
This benchmark evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks.
Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings.
arXiv Detail & Related papers (2024-11-28T19:31:50Z) - GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs [38.281562732050084]
GenSim2 is a scalable framework for complex and realistic simulation task creation.
The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts.
We show a promising usage of GenSim2 that the generated data can be used for zero-shot transfer or co-train with real-world collected data.
arXiv Detail & Related papers (2024-10-04T17:51:33Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - GenSim: Generating Robotic Simulation Tasks via Large Language Models [34.79613485106202]
GenSim aims to automatically generate rich simulation environments and expert demonstrations.
We use GPT4 to expand the existing benchmark by ten times to over 100 tasks.
With minimal sim-to-real adaptation, multitask policies pretrained on GPT4-generated simulation tasks exhibit stronger transfer to unseen long-horizon tasks in the real world.
arXiv Detail & Related papers (2023-10-02T17:23:48Z) - Imitating Task and Motion Planning with Visuomotor Transformers [71.41938181838124]
Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations.
In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation.
We present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent.
arXiv Detail & Related papers (2023-05-25T17:58:14Z) - CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation
Learning [33.88636835443266]
We propose a framework to better scale up robot learning under the lens of multi-task, multi-scene robot manipulation in kitchen environments.
Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training.
In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage.
arXiv Detail & Related papers (2022-12-12T05:30:08Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.