Learning Dexterous Manipulation from Suboptimal Experts
- URL: http://arxiv.org/abs/2010.08587v2
- Date: Tue, 5 Jan 2021 17:22:00 GMT
- Title: Learning Dexterous Manipulation from Suboptimal Experts
- Authors: Rae Jeong, Jost Tobias Springenberg, Jackie Kay, Daniel Zheng, Yuxiang
Zhou, Alexandre Galashov, Nicolas Heess, Francesco Nori
- Abstract summary: Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
- Score: 69.8017067648129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning dexterous manipulation in high-dimensional state-action spaces is an
important open challenge with exploration presenting a major bottleneck.
Although in many cases the learning process could be guided by demonstrations
or other suboptimal experts, current RL algorithms for continuous action spaces
often fail to effectively utilize combinations of highly off-policy expert data
and on-policy exploration data. As a solution, we introduce Relative Entropy
Q-Learning (REQ), a simple policy iteration algorithm that combines ideas from
successful offline and conventional RL algorithms. It represents the optimal
policy via importance sampling from a learned prior and is well-suited to take
advantage of mixed data distributions. We demonstrate experimentally that REQ
outperforms several strong baselines on robotic manipulation tasks for which
suboptimal experts are available. We show how suboptimal experts can be
constructed effectively by composing simple waypoint tracking controllers, and
we also show how learned primitives can be combined with waypoint controllers
to obtain reference behaviors to bootstrap a complex manipulation task on a
simulated bimanual robot with human-like hands. Finally, we show that REQ is
also effective for general off-policy RL, offline RL, and RL from
demonstrations. Videos and further materials are available at
sites.google.com/view/rlfse.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Equivariant Offline Reinforcement Learning [7.822389399560674]
We investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations.
Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts.
arXiv Detail & Related papers (2024-06-20T03:02:49Z) - REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous
Manipulation [61.7171775202833]
We introduce an efficient system for learning dexterous manipulation skills withReinforcement learning.
The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping.
Our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy.
arXiv Detail & Related papers (2023-09-06T19:05:31Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.