RT-1: Robotics Transformer for Real-World Control at Scale
- URL: http://arxiv.org/abs/2212.06817v2
- Date: Fri, 11 Aug 2023 17:45:27 GMT
- Title: RT-1: Robotics Transformer for Real-World Control at Scale
- Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph
Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog,
Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally
Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang,
Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha
Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta,
Emily Perez, Karl Pertsch, Jornell Quiambao, Kanishka Rao, Michael Ryoo,
Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke,
Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan
Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
- Abstract summary: We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
- Score: 98.09428483862165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By transferring knowledge from large, diverse, task-agnostic datasets, modern
machine learning models can solve specific downstream tasks either zero-shot or
with small task-specific datasets to a high level of performance. While this
capability has been demonstrated in other fields such as computer vision,
natural language processing or speech recognition, it remains to be shown in
robotics, where the generalization capabilities of the models are particularly
critical due to the difficulty of collecting real-world robotic data. We argue
that one of the keys to the success of such general robotic models lies with
open-ended task-agnostic training, combined with high-capacity architectures
that can absorb all of the diverse, robotic data. In this paper, we present a
model class, dubbed Robotics Transformer, that exhibits promising scalable
model properties. We verify our conclusions in a study of different model
classes and their ability to generalize as a function of the data size, model
size, and data diversity based on a large-scale data collection on real robots
performing real-world tasks. The project's website and videos can be found at
robotics-transformer1.github.io
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments [26.66666135624716]
We present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies.
RUMs can generalize to new environments without any finetuning.
We train five utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects.
arXiv Detail & Related papers (2024-09-09T17:59:50Z) - Semantically Controllable Augmentations for Generalizable Robot Learning [40.89398799604755]
Generalization to unseen real-world scenarios for robot manipulation requires exposure to diverse datasets during training.
We propose a generative augmentation framework for semantically controllable augmentations and rapidly multiplying robot datasets.
arXiv Detail & Related papers (2024-09-02T05:25:34Z) - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.