Reusable Options through Gradient-based Meta Learning
- URL: http://arxiv.org/abs/2212.11726v2
- Date: Tue, 4 Apr 2023 10:46:54 GMT
- Title: Reusable Options through Gradient-based Meta Learning
- Authors: David Kuric, Herke van Hoof
- Abstract summary: Several deep learning approaches were proposed to learn temporal abstractions in the form of options in an end-to-end manner.
We frame the problem of learning options as a gradient-based meta-learning problem.
We show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods.
- Score: 24.59017394648942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical methods in reinforcement learning have the potential to reduce
the amount of decisions that the agent needs to perform when learning new
tasks. However, finding reusable useful temporal abstractions that facilitate
fast learning remains a challenging problem. Recently, several deep learning
approaches were proposed to learn such temporal abstractions in the form of
options in an end-to-end manner. In this work, we point out several
shortcomings of these methods and discuss their potential negative
consequences. Subsequently, we formulate the desiderata for reusable options
and use these to frame the problem of learning options as a gradient-based
meta-learning problem. This allows us to formulate an objective that explicitly
incentivizes options which allow a higher-level decision maker to adjust in few
steps to different tasks. Experimentally, we show that our method is able to
learn transferable components which accelerate learning and performs better
than existing prior methods developed for this setting. Additionally, we
perform ablations to quantify the impact of using gradient-based meta-learning
as well as other proposed changes.
Related papers
- Clustering-based Domain-Incremental Learning [4.835091081509403]
Key challenge in continual learning is the so-called "catastrophic forgetting problem"
We propose an online clustering-based approach on a dynamically updated finite pool of samples or gradients.
We demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-09-21T13:49:05Z) - The Paradox of Choice: Using Attention in Hierarchical Reinforcement
Learning [59.777127897688594]
We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options.
We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices.
arXiv Detail & Related papers (2022-01-24T13:18:02Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - Derivative-Free Reinforcement Learning: A Review [11.568151821073952]
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments.
Derivative-free optimization, meanwhile, is capable of solving sophisticated problems.
This article summarizes methods of derivative-free reinforcement learning to date, and organizes the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods.
arXiv Detail & Related papers (2021-02-10T19:29:22Z) - Meta-learning the Learning Trends Shared Across Tasks [123.10294801296926]
Gradient-based meta-learning algorithms excel at quick adaptation to new tasks with limited data.
Existing meta-learning approaches only depend on the current task information during the adaptation.
We propose a 'Path-aware' model-agnostic meta-learning approach.
arXiv Detail & Related papers (2020-10-19T08:06:47Z) - Learning Diverse Options via InfoMax Termination Critic [0.0]
We consider the problem of autonomously learning reusable temporally extended actions, or options, in reinforcement learning.
Motivated by the recent success of mutual information based skill learning, we hypothesize that more diverse options are more reusable.
We propose a method for learning gradient of options by maximizing MI between options and corresponding state transitions.
arXiv Detail & Related papers (2020-10-06T14:21:05Z) - Incremental Object Detection via Meta-Learning [77.55310507917012]
We propose a meta-learning approach that learns to reshape model gradients, such that information across incremental tasks is optimally shared.
In comparison to existing meta-learning methods, our approach is task-agnostic, allows incremental addition of new-classes and scales to high-capacity models for object detection.
arXiv Detail & Related papers (2020-03-17T13:40:00Z) - Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding
Meta-Amortization Error [50.83356836818667]
We develop a novel meta-regularization objective using it cyclical annealing schedule and it maximum mean discrepancy (MMD) criterion.
The experimental results show that our approach substantially outperforms standard meta-learning algorithms.
arXiv Detail & Related papers (2020-03-04T04:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.