Related papers: JUICER: Data-Efficient Imitation Learning for Robotic Assembly

JUICER: Data-Efficient Imitation Learning for Robotic Assembly

URL: http://arxiv.org/abs/2404.03729v3
Date: Mon, 11 Nov 2024 14:09:00 GMT
Title: JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Authors: Lars Ankile, Anthony Simeonov, Idan Shenfeld, Pulkit Agrawal,
Abstract summary: This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps.
Score: 21.43402768760014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisely grasping, reorienting, and inserting multiple parts over long horizons and multiple task phases. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. These help expand dataset support and supervise the model with locally corrective actions near bottleneck regions requiring high precision. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps directly from RGB images, outperforming imitation and data augmentation baselines. Project website: https://imitation-juicer.github.io/.

Related papers

What Matters in Learning from Large-Scale Datasets for Robot Manipulation [12.703188997313223]
We conduct a large-scale dataset composition study to answer this question.<n>We develop a data generation framework to procedurally emulate common sources of diversity in existing datasets.<n>We find that camera poses and spatial arrangements are crucial dimensions for both diversity in collection and alignment in retrieval.
arXiv Detail & Related papers (2025-06-16T14:25:29Z)
Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space [16.787049521081983]
Imitation learning (IL) with human demonstrations is a promising method for robotic manipulation tasks.<n>We introduce a Hierarchical Data Collection Space (HD-Space) for robotic imitation learning, a simple data collection scheme.<n>We conduct empirical evaluations across two simulated and five real-world long-horizon manipulation tasks.
arXiv Detail & Related papers (2025-05-23T01:57:45Z)
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation [57.34255010956452]
This work revisits scaling with synthetic data and focuses on developing video-LLMs from a data-centric perspective.<n>We propose a data augmentation method called Sparrow, which synthesizes video-like samples from pure text instruction data.<n>Our proposed method achieves performance comparable to or even superior to that of baselines trained with significantly more samples.
arXiv Detail & Related papers (2024-11-29T18:59:54Z)
λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics [11.901933884058021]
We introduce the LAMBDA benchmark-Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities. This benchmark evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks. Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings.
arXiv Detail & Related papers (2024-11-28T19:31:50Z)
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs [38.281562732050084]
GenSim2 is a scalable framework for complex and realistic simulation task creation. The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts. We show a promising usage of GenSim2 that the generated data can be used for zero-shot transfer or co-train with real-world collected data.
arXiv Detail & Related papers (2024-10-04T17:51:33Z)
Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame. ATM outperforms strong video pre-training baselines by 80% on average. We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
GenSim: Generating Robotic Simulation Tasks via Large Language Models [34.79613485106202]
GenSim aims to automatically generate rich simulation environments and expert demonstrations. We use GPT4 to expand the existing benchmark by ten times to over 100 tasks. With minimal sim-to-real adaptation, multitask policies pretrained on GPT4-generated simulation tasks exhibit stronger transfer to unseen long-horizon tasks in the real world.
arXiv Detail & Related papers (2023-10-02T17:23:48Z)
Imitating Task and Motion Planning with Visuomotor Transformers [71.41938181838124]
Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. We present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent.
arXiv Detail & Related papers (2023-05-25T17:58:14Z)
CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning [33.88636835443266]
We propose a framework to better scale up robot learning under the lens of multi-task, multi-scene robot manipulation in kitchen environments. Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training. In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage.
arXiv Detail & Related papers (2022-12-12T05:30:08Z)
Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition. Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss. We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.