Autonomous Improvement of Instruction Following Skills via Foundation Models
- URL: http://arxiv.org/abs/2407.20635v2
- Date: Tue, 15 Oct 2024 17:54:17 GMT
- Title: Autonomous Improvement of Instruction Following Skills via Foundation Models
- Authors: Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, Sergey Levine,
- Abstract summary: Intelligent instruction-following robots capable of improving from autonomously collected experience have the potential to transform robot learning.
We propose a novel approach that allows instruction-following policies to improve from autonomously collected data without human supervision.
We carry out extensive experiments in the real world to demonstrate the effectiveness of our approach, and find that in a suite of unseen environments, the robot policy can be improved 2x with autonomously collected data.
- Score: 44.63552778566584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent instruction-following robots capable of improving from autonomously collected experience have the potential to transform robot learning: instead of collecting costly teleoperated demonstration data, large-scale deployment of fleets of robots can quickly collect larger quantities of autonomous data that can collectively improve their performance. However, autonomous improvement requires solving two key problems: (i) fully automating a scalable data collection procedure that can collect diverse and semantically meaningful robot data and (ii) learning from non-optimal, autonomous data with no human annotations. To this end, we propose a novel approach that addresses these challenges, allowing instruction-following policies to improve from autonomously collected data without human supervision. Our framework leverages vision-language models to collect and evaluate semantically meaningful experiences in new environments, and then utilizes a decomposition of instruction following tasks into (semantic) language-conditioned image generation and (non-semantic) goal reaching, which makes it significantly more practical to improve from this autonomously collected data without any human annotations. We carry out extensive experiments in the real world to demonstrate the effectiveness of our approach, and find that in a suite of unseen environments, the robot policy can be improved 2x with autonomously collected data. We open-source the code for our semantic autonomous improvement pipeline, as well as our autonomous dataset of 30.5K trajectories collected across five tabletop environments.
Related papers
- So You Think You Can Scale Up Autonomous Robot Data Collection? [22.7035324720716]
Reinforcement learning (RL) comes with the promise of enabling autonomous data collection.
It remains challenging to scale in the real-world partly due to the significant effort required for environment design and instrumentation.
Imitation learning (IL) methods require little to no environment design effort, but instead require significant human supervision.
arXiv Detail & Related papers (2024-11-04T05:31:35Z) - Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments [26.66666135624716]
We present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies.
RUMs can generalize to new environments without any finetuning.
We train five utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects.
arXiv Detail & Related papers (2024-09-09T17:59:50Z) - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection.
We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.