Reward-Respecting Subtasks for Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2202.03466v4
- Date: Sat, 16 Sep 2023 23:59:17 GMT
- Title: Reward-Respecting Subtasks for Model-Based Reinforcement Learning
- Authors: Richard S. Sutton and Marlos C. Machado and G. Zacharias Holland and
David Szepesvari and Finbarr Timbers and Brian Tanner and Adam White
- Abstract summary: Reinforcement learning must include planning with a model of the world that is abstract in state and time.
One reason for this is that the space of possible options is immense, and the methods previously proposed for option discovery do not take into account how the option models will be used in planning.
We show that option models obtained from reward-respecting subtasks are much more likely to be useful in planning than eigenoptions, shortest path options based on bottleneck states, or reward-respecting options generated by the option-critic.
- Score: 13.906158484935098
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To achieve the ambitious goals of artificial intelligence, reinforcement
learning must include planning with a model of the world that is abstract in
state and time. Deep learning has made progress with state abstraction, but
temporal abstraction has rarely been used, despite extensively developed theory
based on the options framework. One reason for this is that the space of
possible options is immense, and the methods previously proposed for option
discovery do not take into account how the option models will be used in
planning. Options are typically discovered by posing subsidiary tasks, such as
reaching a bottleneck state or maximizing the cumulative sum of a sensory
signal other than reward. Each subtask is solved to produce an option, and then
a model of the option is learned and made available to the planning process. In
most previous work, the subtasks ignore the reward on the original problem,
whereas we propose subtasks that use the original reward plus a bonus based on
a feature of the state at the time the option terminates. We show that option
models obtained from such reward-respecting subtasks are much more likely to be
useful in planning than eigenoptions, shortest path options based on bottleneck
states, or reward-respecting options generated by the option-critic. Reward
respecting subtasks strongly constrain the space of options and thereby also
provide a partial solution to the problem of option discovery. Finally, we show
how values, policies, options, and models can all be learned online and
off-policy using standard algorithms and general value functions.
Related papers
- Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution.
Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z) - Experiment Planning with Function Approximation [49.50254688629728]
We study the problem of experiment planning with function approximation in contextual bandit problems.
We propose two experiment planning strategies compatible with function approximation.
We show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small.
arXiv Detail & Related papers (2024-01-10T14:40:23Z) - Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning [83.41487567765871]
Skipper is a model-based reinforcement learning framework.
It automatically generalizes the task given into smaller, more manageable subtasks.
It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
arXiv Detail & Related papers (2023-09-30T02:25:18Z) - Matching options to tasks using Option-Indexed Hierarchical
Reinforcement Learning [20.85397773933171]
We propose a novel option indexing approach to hierarchical learning (OI-HRL)
This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time.
We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems.
arXiv Detail & Related papers (2022-06-12T14:39:02Z) - Discrete State-Action Abstraction via the Successor Representation [3.453310639983932]
Abstraction is one approach that provides the agent with an intrinsic reward for transitioning in a latent space.
Our approach is the first for automatically learning a discrete abstraction of the underlying environment.
Our proposed algorithm, Discrete State-Action Abstraction (DSAA), iteratively swaps between training these options and using them to efficiently explore more of the environment.
arXiv Detail & Related papers (2022-06-07T17:37:30Z) - GrASP: Gradient-Based Affordance Selection for Planning [25.548880832898757]
Planning with a learned model is arguably a key component of intelligence.
We present a method for selecting affordances useful for planning.
We show that it is feasible to learn to select both primitive-action and option affordances.
arXiv Detail & Related papers (2022-02-08T03:24:36Z) - Temporal Abstraction in Reinforcement Learning with the Successor
Representation [65.69658154078007]
We argue that the successor representation (SR) can be seen as a natural substrate for the discovery and use of temporal abstractions.
We show how the SR can be used to discover options that facilitate either temporally-extended exploration or planning.
arXiv Detail & Related papers (2021-10-12T05:07:43Z) - Temporally Abstract Partial Models [62.12485855601448]
We develop temporally abstract partial option models that take into account the fact that an option might be affordable only in certain situations.
We analyze the trade-offs between estimation and approximation error in planning and learning when using such models.
arXiv Detail & Related papers (2021-08-06T17:26:21Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.