Diversity-Enriched Option-Critic
- URL: http://arxiv.org/abs/2011.02565v1
- Date: Wed, 4 Nov 2020 22:12:54 GMT
- Title: Diversity-Enriched Option-Critic
- Authors: Anand Kamat and Doina Precup
- Abstract summary: We show that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks.
Our approach generates robust, reusable, reliable and interpretable options, in contrast to option-critic.
- Score: 47.82697599507171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal abstraction allows reinforcement learning agents to represent
knowledge and develop strategies over different temporal scales. The
option-critic framework has been demonstrated to learn temporally extended
actions, represented as options, end-to-end in a model-free setting. However,
feasibility of option-critic remains limited due to two major challenges,
multiple options adopting very similar behavior, or a shrinking set of task
relevant options. These occurrences not only void the need for temporal
abstraction, they also affect performance. In this paper, we tackle these
problems by learning a diverse set of options. We introduce an
information-theoretic intrinsic reward, which augments the task reward, as well
as a novel termination objective, in order to encourage behavioral diversity in
the option set. We show empirically that our proposed method is capable of
learning options end-to-end on several discrete and continuous control tasks,
outperforms option-critic by a wide margin. Furthermore, we show that our
approach sustainably generates robust, reusable, reliable and interpretable
options, in contrast to option-critic.
Related papers
- Reusable Options through Gradient-based Meta Learning [24.59017394648942]
Several deep learning approaches were proposed to learn temporal abstractions in the form of options in an end-to-end manner.
We frame the problem of learning options as a gradient-based meta-learning problem.
We show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods.
arXiv Detail & Related papers (2022-12-22T14:19:35Z) - The Paradox of Choice: Using Attention in Hierarchical Reinforcement
Learning [59.777127897688594]
We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options.
We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices.
arXiv Detail & Related papers (2022-01-24T13:18:02Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Temporal Abstraction in Reinforcement Learning with the Successor
Representation [65.69658154078007]
We argue that the successor representation (SR) can be seen as a natural substrate for the discovery and use of temporal abstractions.
We show how the SR can be used to discover options that facilitate either temporally-extended exploration or planning.
arXiv Detail & Related papers (2021-10-12T05:07:43Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - SOAC: The Soft Option Actor-Critic Architecture [25.198302636265286]
Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy.
Existing methods typically suffer from two major challenges: ineffective exploration and unstable updates.
We present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges.
arXiv Detail & Related papers (2020-06-25T13:06:59Z) - Optimal Options for Multi-Task Reinforcement Learning Under Time
Constraints [0.6445605125467573]
Reinforcement learning can benefit from the use of options as a way of encoding recurring behaviours and to foster exploration.
We investigate some of the conditions that influence optimality of options, in settings where agents have a limited time budget for learning each task.
We show that the discovered options significantly differ depending on factors such as the available learning time budget and that the found options outperform popular option-generations.
arXiv Detail & Related papers (2020-01-06T15:08:46Z) - Options of Interest: Temporal Abstraction with Interest Functions [58.30081828754683]
We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option.
We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture.
arXiv Detail & Related papers (2020-01-01T21:24:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.