Flexible Option Learning
- URL: http://arxiv.org/abs/2112.03097v1
- Date: Mon, 6 Dec 2021 15:07:48 GMT
- Title: Flexible Option Learning
- Authors: Martin Klissarov and Doina Precup
- Abstract summary: We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
- Score: 69.78645585943592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal abstraction in reinforcement learning (RL), offers the promise of
improving generalization and knowledge transfer in complex environments, by
propagating information more efficiently over time. Although option learning
was initially formulated in a way that allows updating many options
simultaneously, using off-policy, intra-option learning (Sutton, Precup &
Singh, 1999), many of the recent hierarchical reinforcement learning approaches
only update a single option at a time: the option currently executing. We
revisit and extend intra-option learning in the context of deep reinforcement
learning, in order to enable updating all options consistent with current
primitive action choices, without introducing any additional estimates. Our
method can therefore be naturally adopted in most hierarchical RL frameworks.
When we combine our approach with the option-critic algorithm for option
discovery, we obtain significant improvements in performance and
data-efficiency across a wide variety of domains.
Related papers
- Matching options to tasks using Option-Indexed Hierarchical
Reinforcement Learning [20.85397773933171]
We propose a novel option indexing approach to hierarchical learning (OI-HRL)
This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time.
We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems.
arXiv Detail & Related papers (2022-06-12T14:39:02Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Active Reinforcement Learning over MDPs [29.59790449462737]
This paper proposes a framework of Active Reinforcement Learning (ARL) over MDPs to improve generalization efficiency in a limited resource by instance selection.
Unlike existing approaches, we attempt to actively select and use training data rather than train on all the given data, thereby costing fewer resources.
arXiv Detail & Related papers (2021-08-05T00:18:11Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Discovery of Options via Meta-Learned Subgoals [59.2160583043938]
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
We introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments.
arXiv Detail & Related papers (2021-02-12T19:50:40Z) - Meta-learning the Learning Trends Shared Across Tasks [123.10294801296926]
Gradient-based meta-learning algorithms excel at quick adaptation to new tasks with limited data.
Existing meta-learning approaches only depend on the current task information during the adaptation.
We propose a 'Path-aware' model-agnostic meta-learning approach.
arXiv Detail & Related papers (2020-10-19T08:06:47Z) - Learning Diverse Options via InfoMax Termination Critic [0.0]
We consider the problem of autonomously learning reusable temporally extended actions, or options, in reinforcement learning.
Motivated by the recent success of mutual information based skill learning, we hypothesize that more diverse options are more reusable.
We propose a method for learning gradient of options by maximizing MI between options and corresponding state transitions.
arXiv Detail & Related papers (2020-10-06T14:21:05Z) - SOAC: The Soft Option Actor-Critic Architecture [25.198302636265286]
Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy.
Existing methods typically suffer from two major challenges: ineffective exploration and unstable updates.
We present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges.
arXiv Detail & Related papers (2020-06-25T13:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.