RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
- URL: http://arxiv.org/abs/2306.11706v2
- Date: Fri, 22 Dec 2023 13:55:42 GMT
- Title: RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
- Authors: Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin,
Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil
Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli,
Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan
Batchelor, Thomas Lampe, Emilio Parisotto, Konrad \.Zo{\l}na, Scott Reed,
Sergio G\'omez Colmenarejo, Jon Scholz, Abbas Abdolmaleki, Oliver Groth,
Jean-Baptiste Regli, Oleg Sushkov, Tom Roth\"orl, Jos\'e Enrique Chen, Yusuf
Aytar, Dave Barker, Joy Ortiz, Martin Riedmiller, Jost Tobias Springenberg,
Raia Hadsell, Francesco Nori, Nicolas Heess
- Abstract summary: We propose a multi-embodiment, multi-task generalist agent for robotic manipulation called RoboCat.
This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions.
With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples.
- Score: 33.10577695383743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to leverage heterogeneous robotic experience from different
robots and tasks to quickly master novel skills and embodiments has the
potential to transform robot learning. Inspired by recent advances in
foundation models for vision and language, we propose a multi-embodiment,
multi-task generalist agent for robotic manipulation. This agent, named
RoboCat, is a visual goal-conditioned decision transformer capable of consuming
action-labelled visual experience. This data spans a large repertoire of motor
control skills from simulated and real robotic arms with varying sets of
observations and actions. With RoboCat, we demonstrate the ability to
generalise to new tasks and robots, both zero-shot as well as through
adaptation using only 100-1000 examples for the target task. We also show how a
trained model itself can be used to generate data for subsequent training
iterations, thus providing a basic building block for an autonomous improvement
loop. We investigate the agent's capabilities, with large-scale evaluations
both in simulation and on three different real robot embodiments. We find that
as we grow and diversify its training data, RoboCat not only shows signs of
cross-task transfer, but also becomes more efficient at adapting to new tasks.
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - Know your limits! Optimize the robot's behavior through self-awareness [11.021217430606042]
Recent human-robot imitation algorithms focus on following a reference human motion with high precision.
We introduce a deep-learning model that anticipates the robot's performance when imitating a given reference.
Our Self-AWare model (SAW) ranks potential robot behaviors based on various criteria, such as fall likelihood, adherence to the reference motion, and smoothness.
arXiv Detail & Related papers (2024-09-16T14:14:58Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Zero-Shot Robot Manipulation from Passive Human Videos [59.193076151832145]
We develop a framework for extracting agent-agnostic action representations from human videos.
Our framework is based on predicting plausible human hand trajectories.
We deploy the trained model zero-shot for physical robot manipulation tasks.
arXiv Detail & Related papers (2023-02-03T21:39:52Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - PACT: Perception-Action Causal Transformer for Autoregressive Robotics
Pre-Training [25.50131893785007]
This work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot.
We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion.
We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously.
arXiv Detail & Related papers (2022-09-22T16:20:17Z) - MetaMorph: Learning Universal Controllers with Transformers [45.478223199658785]
In robotics we primarily train a single robot for a single task.
modular robot systems now allow for the flexible combination of general-purpose building blocks into task optimized morphologies.
We propose MetaMorph, a Transformer based approach to learn a universal controller over a modular robot design space.
arXiv Detail & Related papers (2022-03-22T17:58:31Z) - Lifelong Robotic Reinforcement Learning by Retaining Experiences [61.79346922421323]
Many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times.
In this work, we study a practical sequential multi-task RL problem motivated by the practical constraints of physical robotic systems.
We derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set.
arXiv Detail & Related papers (2021-09-19T18:00:51Z) - Adaptation of Quadruped Robot Locomotion with Meta-Learning [64.71260357476602]
We demonstrate that meta-reinforcement learning can be used to successfully train a robot capable to solve a wide range of locomotion tasks.
The performance of the meta-trained robot is similar to that of a robot that is trained on a single task.
arXiv Detail & Related papers (2021-07-08T10:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.