On the Role of Weight Sharing During Deep Option Learning
- URL: http://arxiv.org/abs/1912.13408v2
- Date: Thu, 6 Feb 2020 06:19:04 GMT
- Title: On the Role of Weight Sharing During Deep Option Learning
- Authors: Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, Miao Liu, Gerald
Tesauro
- Abstract summary: The options framework is a popular approach for building temporally extended actions in reinforcement learning.
Past work makes the key assumption that each of the components of option-critic has independent parameters.
We consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update.
- Score: 21.216780543401235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The options framework is a popular approach for building temporally extended
actions in reinforcement learning. In particular, the option-critic
architecture provides general purpose policy gradient theorems for learning
actions from scratch that are extended in time. However, past work makes the
key assumption that each of the components of option-critic has independent
parameters. In this work we note that while this key assumption of the policy
gradient theorems of option-critic holds in the tabular case, it is always
violated in practice for the deep function approximation setting. We thus
reconsider this assumption and consider more general extensions of
option-critic and hierarchical option-critic training that optimize for the
full architecture with each update. It turns out that not assuming parameter
independence challenges a belief in prior work that training the policy over
options can be disentangled from the dynamics of the underlying options. In
fact, learning can be sped up by focusing the policy over options on states
where options are actually likely to terminate. We put our new algorithms to
the test in application to sample efficient learning of Atari games, and
demonstrate significantly improved stability and faster convergence when
learning long options.
Related papers
- SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments [18.081732498034047]
This work compares ways of extending Reinforcement Learning algorithms to Partially Observed Markov Decision Processes (POMDPs) with options.
Two algorithms, PPOEM and SOAP, are proposed and studied in depth to address this problem.
arXiv Detail & Related papers (2024-07-26T17:59:55Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Iterative Option Discovery for Planning, by Planning [15.731719079249814]
We propose an analogous approach to option discovery called Option Iteration.
Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future.
Having learned such a set of locally strong policies, we can use them to guide the search algorithm resulting in a virtuous cycle where better options lead to better search results.
arXiv Detail & Related papers (2023-10-02T19:03:30Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - Temporal Abstraction in Reinforcement Learning with the Successor
Representation [65.69658154078007]
We argue that the successor representation (SR) can be seen as a natural substrate for the discovery and use of temporal abstractions.
We show how the SR can be used to discover options that facilitate either temporally-extended exploration or planning.
arXiv Detail & Related papers (2021-10-12T05:07:43Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Diversity-Enriched Option-Critic [47.82697599507171]
We show that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks.
Our approach generates robust, reusable, reliable and interpretable options, in contrast to option-critic.
arXiv Detail & Related papers (2020-11-04T22:12:54Z) - Data-efficient Hindsight Off-policy Option Learning [20.42535406663446]
We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm.
It robustly trains all policy components off-policy and end-to-end.
The approach outperforms existing option learning methods on common benchmarks.
arXiv Detail & Related papers (2020-07-30T16:52:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.