Temporal Abstraction in Reinforcement Learning with the Successor
Representation
- URL: http://arxiv.org/abs/2110.05740v3
- Date: Tue, 11 Apr 2023 21:03:47 GMT
- Title: Temporal Abstraction in Reinforcement Learning with the Successor
Representation
- Authors: Marlos C. Machado and Andre Barreto and Doina Precup and Michael
Bowling
- Abstract summary: We argue that the successor representation (SR) can be seen as a natural substrate for the discovery and use of temporal abstractions.
We show how the SR can be used to discover options that facilitate either temporally-extended exploration or planning.
- Score: 65.69658154078007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reasoning at multiple levels of temporal abstraction is one of the key
attributes of intelligence. In reinforcement learning, this is often modeled
through temporally extended courses of actions called options. Options allow
agents to make predictions and to operate at different levels of abstraction
within an environment. Nevertheless, approaches based on the options framework
often start with the assumption that a reasonable set of options is known
beforehand. When this is not the case, there are no definitive answers for
which options one should consider. In this paper, we argue that the successor
representation (SR), which encodes states based on the pattern of state
visitation that follows them, can be seen as a natural substrate for the
discovery and use of temporal abstractions. To support our claim, we take a big
picture view of recent results, showing how the SR can be used to discover
options that facilitate either temporally-extended exploration or planning. We
cast these results as instantiations of a general framework for option
discovery in which the agent's representation is used to identify useful
options, which are then used to further improve its representation. This
results in a virtuous, never-ending, cycle in which both the representation and
the options are constantly refined based on each other. Beyond option discovery
itself, we also discuss how the SR allows us to augment a set of options into a
combinatorially large counterpart without additional learning. This is achieved
through the combination of previously learned options. Our empirical evaluation
focuses on options discovered for exploration and on the use of the SR to
combine them. The results of our experiments shed light on important design
decisions involved in the definition of options and demonstrate the synergy of
different methods based on the SR, such as eigenoptions and the option
keyboard.
Related papers
- Reward-Respecting Subtasks for Model-Based Reinforcement Learning [13.906158484935098]
Reinforcement learning must include planning with a model of the world that is abstract in state and time.
One reason for this is that the space of possible options is immense, and the methods previously proposed for option discovery do not take into account how the option models will be used in planning.
We show that option models obtained from reward-respecting subtasks are much more likely to be useful in planning than eigenoptions, shortest path options based on bottleneck states, or reward-respecting options generated by the option-critic.
arXiv Detail & Related papers (2022-02-07T19:09:27Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Flexible Option Learning [69.78645585943592]
We revisit and extend intra-option learning in the context of deep reinforcement learning.
We obtain significant improvements in performance and data-efficiency across a wide variety of domains.
arXiv Detail & Related papers (2021-12-06T15:07:48Z) - Context-Specific Representation Abstraction for Deep Option Learning [43.68681795014662]
We introduce Context-Specific Representation Abstraction for Deep Option Learning (CRADOL)
CRADOL is a new framework that considers both temporal abstraction and context-specific representation abstraction to effectively reduce the size of the search over policy space.
Specifically, our method learns a factored belief state representation that enables each option to learn a policy over only a subsection of the state space.
arXiv Detail & Related papers (2021-09-20T22:50:01Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Diversity-Enriched Option-Critic [47.82697599507171]
We show that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks.
Our approach generates robust, reusable, reliable and interpretable options, in contrast to option-critic.
arXiv Detail & Related papers (2020-11-04T22:12:54Z) - Options of Interest: Temporal Abstraction with Interest Functions [58.30081828754683]
We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option.
We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture.
arXiv Detail & Related papers (2020-01-01T21:24:39Z) - On the Role of Weight Sharing During Deep Option Learning [21.216780543401235]
The options framework is a popular approach for building temporally extended actions in reinforcement learning.
Past work makes the key assumption that each of the components of option-critic has independent parameters.
We consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update.
arXiv Detail & Related papers (2019-12-31T16:49:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.