Bayesian Nonparametrics for Offline Skill Discovery
- URL: http://arxiv.org/abs/2202.04675v1
- Date: Wed, 9 Feb 2022 19:01:01 GMT
- Title: Bayesian Nonparametrics for Offline Skill Discovery
- Authors: Valentin Villecroze, Harry J. Braviner, Panteha Naderian, Chris J.
Maddison, Gabriel Loaiza-Ganem
- Abstract summary: Recent work in offline reinforcement learning and imitation learning has proposed several techniques for skill discovery from a set of expert trajectories.
We first propose a method for offline learning of options exploiting advances in variational inference and continuous relaxations.
We show how our nonparametric extension can be applied in other skill frameworks, and empirically demonstrate that our method can outperform state-of-the-art offline skill learning algorithms.
- Score: 19.28178596044852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Skills or low-level policies in reinforcement learning are temporally
extended actions that can speed up learning and enable complex behaviours.
Recent work in offline reinforcement learning and imitation learning has
proposed several techniques for skill discovery from a set of expert
trajectories. While these methods are promising, the number K of skills to
discover is always a fixed hyperparameter, which requires either prior
knowledge about the environment or an additional parameter search to tune it.
We first propose a method for offline learning of options (a particular skill
framework) exploiting advances in variational inference and continuous
relaxations. We then highlight an unexplored connection between Bayesian
nonparametrics and offline skill discovery, and show how to obtain a
nonparametric version of our model. This version is tractable thanks to a
carefully structured approximate posterior with a dynamically-changing number
of options, removing the need to specify K. We also show how our nonparametric
extension can be applied in other skill frameworks, and empirically demonstrate
that our method can outperform state-of-the-art offline skill learning
algorithms across a variety of environments. Our code is available at
https://github.com/layer6ai-labs/BNPO .
Related papers
- Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts [58.220879689376744]
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy.
We propose textbfDiverse textbfSkill textbfLearning (Di-SkilL) for learning diverse skills.
We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
arXiv Detail & Related papers (2024-03-11T17:49:18Z) - Customizable Combination of Parameter-Efficient Modules for Multi-Task
Learning [11.260650180067278]
We introduce a novel approach that combines task-common skills and task-specific skills.
A skill assignment matrix is jointly learned.
Our findings demonstrate that C-Poly outperforms fully-shared, task-specific, and skill-indistinguishable baselines.
arXiv Detail & Related papers (2023-12-06T02:47:56Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks.
We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z) - DLCFT: Deep Linear Continual Fine-Tuning for General Incremental
Learning [29.80680408934347]
We propose an alternative framework to incremental learning where we continually fine-tune the model from a pre-trained representation.
Our method takes advantage of linearization technique of a pre-trained neural network for simple and effective continual learning.
We show that our method can be applied to general continual learning settings, we evaluate our method in data-incremental, task-incremental, and class-incremental learning problems.
arXiv Detail & Related papers (2022-08-17T06:58:14Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.