Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning
- URL: http://arxiv.org/abs/2303.01488v1
- Date: Thu, 2 Mar 2023 18:51:38 GMT
- Title: Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning
- Authors: Archit Sharma, Ahmed M. Ahmed, Rehaan Ahmad, Chelsea Finn
- Abstract summary: In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
- Score: 54.636562516974884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In imitation and reinforcement learning, the cost of human supervision limits
the amount of data that robots can be trained on. An aspirational goal is to
construct self-improving robots: robots that can learn and improve on their
own, from autonomous interaction with minimal human supervision or oversight.
Such robots could collect and train on much larger datasets, and thus learn
more robust and performant policies. While reinforcement learning offers a
framework for such autonomous learning via trial-and-error, practical
realizations end up requiring extensive human supervision for reward function
design and repeated resetting of the environment between episodes of
interactions. In this work, we propose MEDAL++, a novel design for
self-improving robotic systems: given a small set of expert demonstrations at
the start, the robot autonomously practices the task by learning to both do and
undo the task, simultaneously inferring the reward function from the
demonstrations. The policy and reward function are learned end-to-end from
high-dimensional visual inputs, bypassing the need for explicit state
estimation or task-specific pre-training for visual encoders used in prior
work. We first evaluate our proposed algorithm on a simulated non-episodic
benchmark EARL, finding that MEDAL++ is both more data efficient and gets up to
30% better final performance compared to state-of-the-art vision-based methods.
Our real-robot experiments show that MEDAL++ can be applied to manipulation
problems in larger environments than those considered in prior work, and
autonomous self-improvement can improve the success rate by 30-70% over
behavior cloning on just the expert data. Code, training and evaluation videos
along with a brief overview is available at:
https://architsharma97.github.io/self-improving-robots/
Related papers
- Generalized Robot Learning Framework [10.03174544844559]
We present a low-cost robot learning framework that is both easily reproducible and transferable to various robots and environments.
We demonstrate that deployable imitation learning can be successfully applied even to industrial-grade robots.
arXiv Detail & Related papers (2024-09-18T15:34:31Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
and Methods [14.780597545674157]
We investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives.
We propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and supervised learning.
arXiv Detail & Related papers (2023-08-07T14:24:52Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from
Offline Data [101.43350024175157]
Self-supervised learning has the potential to decrease the amount of human annotation and engineering effort required to learn control strategies.
Our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem.
We demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks.
arXiv Detail & Related papers (2023-06-06T01:36:56Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z) - Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection.
We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.