RT-1: Robotics Transformer for Real-World Control at Scale
- URL: http://arxiv.org/abs/2212.06817v2
- Date: Fri, 11 Aug 2023 17:45:27 GMT
- Title: RT-1: Robotics Transformer for Real-World Control at Scale
- Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph
Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog,
Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally
Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang,
Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha
Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta,
Emily Perez, Karl Pertsch, Jornell Quiambao, Kanishka Rao, Michael Ryoo,
Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke,
Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan
Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
- Abstract summary: We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
- Score: 98.09428483862165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By transferring knowledge from large, diverse, task-agnostic datasets, modern
machine learning models can solve specific downstream tasks either zero-shot or
with small task-specific datasets to a high level of performance. While this
capability has been demonstrated in other fields such as computer vision,
natural language processing or speech recognition, it remains to be shown in
robotics, where the generalization capabilities of the models are particularly
critical due to the difficulty of collecting real-world robotic data. We argue
that one of the keys to the success of such general robotic models lies with
open-ended task-agnostic training, combined with high-capacity architectures
that can absorb all of the diverse, robotic data. In this paper, we present a
model class, dubbed Robotics Transformer, that exhibits promising scalable
model properties. We verify our conclusions in a study of different model
classes and their ability to generalize as a function of the data size, model
size, and data diversity based on a large-scale data collection on real robots
performing real-world tasks. The project's website and videos can be found at
robotics-transformer1.github.io
Related papers
- AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation [33.10577695383743]
We propose a multi-embodiment, multi-task generalist agent for robotic manipulation called RoboCat.
This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions.
With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples.
arXiv Detail & Related papers (2023-06-20T17:35:20Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - PACT: Perception-Action Causal Transformer for Autoregressive Robotics
Pre-Training [25.50131893785007]
This work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot.
We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion.
We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously.
arXiv Detail & Related papers (2022-09-22T16:20:17Z) - Factored World Models for Zero-Shot Generalization in Robotic
Manipulation [7.258229016768018]
We learn to generalize over robotic pick-and-place tasks using object-factored world models.
We use a residual stack of graph neural networks that receive action information at multiple levels in both their node and edge neural networks.
We show that an ensemble of our models can be used to plan for tasks involving up to 12 pick and place actions using search.
arXiv Detail & Related papers (2022-02-10T21:26:11Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.