Iterative Option Discovery for Planning, by Planning
- URL: http://arxiv.org/abs/2310.01569v2
- Date: Fri, 22 Dec 2023 23:04:53 GMT
- Title: Iterative Option Discovery for Planning, by Planning
- Authors: Kenny Young, Richard S. Sutton
- Abstract summary: We propose an analogous approach to option discovery called Option Iteration.
Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future.
Having learned such a set of locally strong policies, we can use them to guide the search algorithm resulting in a virtuous cycle where better options lead to better search results.
- Score: 15.731719079249814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discovering useful temporal abstractions, in the form of options, is widely
thought to be key to applying reinforcement learning and planning to
increasingly complex domains. Building on the empirical success of the Expert
Iteration approach to policy learning used in AlphaZero, we propose Option
Iteration, an analogous approach to option discovery. Rather than learning a
single strong policy that is trained to match the search results everywhere,
Option Iteration learns a set of option policies trained such that for each
state encountered, at least one policy in the set matches the search results
for some horizon into the future. Intuitively, this may be significantly easier
as it allows the algorithm to hedge its bets compared to learning a single
globally strong policy, which may have complex dependencies on the details of
the current state. Having learned such a set of locally strong policies, we can
use them to guide the search algorithm resulting in a virtuous cycle where
better options lead to better search results which allows for training of
better options. We demonstrate experimentally that planning using options
learned with Option Iteration leads to a significant benefit in challenging
planning environments compared to an analogous planning algorithm operating in
the space of primitive actions and learning a single rollout policy with Expert
Iteration.
Related papers
- Multi-Task Option Learning and Discovery for Stochastic Path Planning [27.384742641275228]
This paper addresses the problem of reliably and efficiently solving broad classes of long-horizon path planning problems.
Our approach computes useful options with policies as well as high-level paths that compose the discovered options.
We show that this approach yields strong guarantees of executability and solvability.
arXiv Detail & Related papers (2022-09-30T19:57:52Z) - Matching options to tasks using Option-Indexed Hierarchical
Reinforcement Learning [20.85397773933171]
We propose a novel option indexing approach to hierarchical learning (OI-HRL)
This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time.
We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems.
arXiv Detail & Related papers (2022-06-12T14:39:02Z) - CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies.
In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems.
We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - SOAC: The Soft Option Actor-Critic Architecture [25.198302636265286]
Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy.
Existing methods typically suffer from two major challenges: ineffective exploration and unstable updates.
We present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges.
arXiv Detail & Related papers (2020-06-25T13:06:59Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.