Learning Diverse Options via InfoMax Termination Critic
- URL: http://arxiv.org/abs/2010.02756v2
- Date: Wed, 31 May 2023 04:06:15 GMT
- Title: Learning Diverse Options via InfoMax Termination Critic
- Authors: Yuji Kanagawa and Tomoyuki Kaneko
- Abstract summary: We consider the problem of autonomously learning reusable temporally extended actions, or options, in reinforcement learning.
Motivated by the recent success of mutual information based skill learning, we hypothesize that more diverse options are more reusable.
We propose a method for learning gradient of options by maximizing MI between options and corresponding state transitions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of autonomously learning reusable temporally extended
actions, or options, in reinforcement learning. While options can speed up
transfer learning by serving as reusable building blocks, learning reusable
options for unknown task distribution remains challenging. Motivated by the
recent success of mutual information (MI) based skill learning, we hypothesize
that more diverse options are more reusable. To this end, we propose a method
for learning termination conditions of options by maximizing MI between options
and corresponding state transitions. We derive a scalable approximation of this
MI maximization via gradient ascent, yielding the InfoMax Termination Critic
(IMTC) algorithm. Our experiments demonstrate that IMTC significantly improves
the diversity of learned options without extrinsic rewards combined with an
intrinsic option learning method. Moreover, we test the reusability of learned
options by transferring options into various tasks, confirming that IMTC helps
quick adaptation, especially in complex domains where an agent needs to
manipulate objects.
Related papers
- Reusable Options through Gradient-based Meta Learning [24.59017394648942]
Several deep learning approaches were proposed to learn temporal abstractions in the form of options in an end-to-end manner.
We frame the problem of learning options as a gradient-based meta-learning problem.
We show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods.
arXiv Detail & Related papers (2022-12-22T14:19:35Z) - The Paradox of Choice: Using Attention in Hierarchical Reinforcement
Learning [59.777127897688594]
We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options.
We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices.
arXiv Detail & Related papers (2022-01-24T13:18:02Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Discovery of Options via Meta-Learned Subgoals [59.2160583043938]
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
We introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments.
arXiv Detail & Related papers (2021-02-12T19:50:40Z) - Diversity-Enriched Option-Critic [47.82697599507171]
We show that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks.
Our approach generates robust, reusable, reliable and interpretable options, in contrast to option-critic.
arXiv Detail & Related papers (2020-11-04T22:12:54Z) - Optimal Options for Multi-Task Reinforcement Learning Under Time
Constraints [0.6445605125467573]
Reinforcement learning can benefit from the use of options as a way of encoding recurring behaviours and to foster exploration.
We investigate some of the conditions that influence optimality of options, in settings where agents have a limited time budget for learning each task.
We show that the discovered options significantly differ depending on factors such as the available learning time budget and that the found options outperform popular option-generations.
arXiv Detail & Related papers (2020-01-06T15:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.