Related papers: From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

URL: http://arxiv.org/abs/2210.10047v2
Date: Wed, 19 Oct 2022 16:57:25 GMT
Title: From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data
Authors: Zichen Jeff Cui, Yibin Wang, Nur Muhammad Mahi Shafiullah, Lerrel Pinto
Abstract summary: Conditional Behavior Transformers (C-BeT) is a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification. C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%. We demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data.
Score: 18.041329181385414
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While large-scale sequence modeling from offline data has led to impressive performance gains in natural language and image generation, directly translating such ideas to robotics has been challenging. One critical reason for this is that uncurated robot demonstration data, i.e. play data, collected from non-expert human demonstrators are often noisy, diverse, and distributionally multi-modal. This makes extracting useful, task-centric behaviors from such data a difficult generative modeling problem. In this work, we present Conditional Behavior Transformers (C-BeT), a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification. On a suite of simulated benchmark tasks, we find that C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%. Further, we demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data without any task labels or reward information. Robot videos are best viewed on our project website: https://play-to-policy.github.io

Related papers

RoboPearls: Editable Video Simulation for Robot Manipulation [81.18434338506621]
RoboPearls is an editable video simulation framework for robotic manipulation.<n>Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations.<n>We conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot.
arXiv Detail & Related papers (2025-06-28T05:03:31Z)
ORV: 4D Occupancy-centric Robot Video Generation [33.360345403049685]
Acquiring real-world robotic simulation data through teleoperation is notoriously time-consuming and labor-intensive.<n>We propose ORV, an Occupancy-centric Robot Video generation framework, which utilizes 4D semantic occupancy sequences as a fine-grained representation.<n>By leveraging occupancy-based representations, ORV enables seamless translation of simulation data into photorealistic robot videos, while ensuring high temporal consistency and precise controllability.
arXiv Detail & Related papers (2025-06-03T17:00:32Z)
DreamGen: Unlocking Generalization in Robot Learning through Video World Models [120.25799361925387]
DreamGen is a pipeline for training robot policies that generalize across behaviors and environments through neural trajectories.<n>Our work establishes a promising new axis for scaling robot learning well beyond manual data collection.
arXiv Detail & Related papers (2025-05-19T04:55:39Z)
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets [24.77850617214567]
We propose a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks. Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions. We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss.
arXiv Detail & Related papers (2024-10-29T17:58:13Z)
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
We introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations. First, we present an automated pipeline to generate conversation-style instruction tuning data for robots from existing behavior cloning datasets. We show that a VLM finetuned with a limited amount of such datasets can produce meaningful action decisions for robotic control.
arXiv Detail & Related papers (2024-06-28T17:59:12Z)
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots [25.650235551519952]
We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments. We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances. Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning.
arXiv Detail & Related papers (2024-06-04T17:41:31Z)
DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model [72.66465487508556]
DiffGen is a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model. It can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
arXiv Detail & Related papers (2024-05-12T15:38:17Z)
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations [55.549956643032836]
MimicGen is a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations. We show that robot agents can be effectively trained on this generated dataset by imitation learning to achieve strong performance in long-horizon and high-precision tasks.
arXiv Detail & Related papers (2023-10-26T17:17:31Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks. One of the key contributing factors to this progress is the scale of robot data used to train the models. We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z)
RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training [25.50131893785007]
This work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot. We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion. We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously.
arXiv Detail & Related papers (2022-09-22T16:20:17Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works. However, learning a model that captures the dynamics of complex skills represents a major challenge. We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.