SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended
Exploration
- URL: http://arxiv.org/abs/2211.13743v1
- Date: Thu, 24 Nov 2022 18:05:01 GMT
- Title: SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended
Exploration
- Authors: Giulia Vezzani, Dhruva Tirumala, Markus Wulfmeier, Dushyant Rao, Abbas
Abdolmaleki, Ben Moran, Tuomas Haarnoja, Jan Humplik, Roland Hafner, Michael
Neunert, Claudio Fantacci, Tim Hertweck, Thomas Lampe, Fereshteh Sadeghi,
Nicolas Heess and Martin Riedmiller
- Abstract summary: Skill reuse is one of the most common approaches, but current methods have considerable limitations.
We introduce an alternative approach to mitigate these problems.
Our approach learns to sequence existing temporally-extended skills for exploration but learns the final policy directly from the raw experience.
- Score: 21.764280583041703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to effectively reuse prior knowledge is a key requirement when
building general and flexible Reinforcement Learning (RL) agents. Skill reuse
is one of the most common approaches, but current methods have considerable
limitations.For example, fine-tuning an existing policy frequently fails, as
the policy can degrade rapidly early in training. In a similar vein,
distillation of expert behavior can lead to poor results when given sub-optimal
experts. We compare several common approaches for skill transfer on multiple
domains including changes in task and system dynamics. We identify how existing
methods can fail and introduce an alternative approach to mitigate these
problems. Our approach learns to sequence existing temporally-extended skills
for exploration but learns the final policy directly from the raw experience.
This conceptual split enables rapid adaptation and thus efficient data
collection but without constraining the final solution.It significantly
outperforms many classical methods across a suite of evaluation tasks and we
use a broad set of ablations to highlight the importance of differentc
omponents of our method.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Online Continual Learning via the Knowledge Invariant and Spread-out
Properties [4.109784267309124]
Key challenge in continual learning is catastrophic forgetting.
We propose a new method, named Online Continual Learning via the Knowledge Invariant and Spread-out Properties (OCLKISP)
We empirically evaluate our proposed method on four popular benchmarks for continual learning: Split CIFAR 100, Split SVHN, Split CUB200 and Split Tiny-Image-Net.
arXiv Detail & Related papers (2023-02-02T04:03:38Z) - Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks.
We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z) - Class-Incremental Learning via Knowledge Amalgamation [14.513858688486701]
Catastrophic forgetting has been a significant problem hindering the deployment of deep learning algorithms in the continual learning setting.
We put forward an alternative strategy to handle the catastrophic forgetting with knowledge amalgamation (CFA)
CFA learns a student network from multiple heterogeneous teacher models specializing in previous tasks and can be applied to current offline methods.
arXiv Detail & Related papers (2022-09-05T19:49:01Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF)
Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness.
We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z) - Decaying Clipping Range in Proximal Policy Optimization [0.0]
Proximal Policy Optimization (PPO) is among the most widely used algorithms in reinforcement learning.
Keys to its success are the reliable policy updates through the clipping mechanism and the multiple epochs of minibatch updates.
We propose linearly and exponentially decaying clipping range approaches throughout the training.
arXiv Detail & Related papers (2021-02-20T22:08:05Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.