Globally Optimal Hierarchical Reinforcement Learning for
Linearly-Solvable Markov Decision Processes
- URL: http://arxiv.org/abs/2106.15380v1
- Date: Tue, 29 Jun 2021 13:10:08 GMT
- Title: Globally Optimal Hierarchical Reinforcement Learning for
Linearly-Solvable Markov Decision Processes
- Authors: Guillermo Infante, Anders Jonsso, Vicen\c{c} G\'omez
- Abstract summary: We present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes.
We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work we present a novel approach to hierarchical reinforcement
learning for linearly-solvable Markov decision processes. Our approach assumes
that the state space is partitioned, and the subtasks consist in moving between
the partitions. We represent value functions on several levels of abstraction,
and use the compositionality of subtasks to estimate the optimal values of the
states in each partition. The policy is implicitly defined on these optimal
value estimates, rather than being decomposed among the subtasks. As a
consequence, our approach can learn the globally optimal policy, and does not
suffer from the non-stationarity of high-level decisions. If several partitions
have equivalent dynamics, the subtasks of those partitions can be shared. If
the set of boundary states is smaller than the entire state space, our approach
can have significantly smaller sample complexity than that of a flat learner,
and we validate this empirically in several experiments.
Related papers
- Hierarchical Average-Reward Linearly-solvable Markov Decision Processes [11.69049916139847]
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes.
Our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks.
Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
arXiv Detail & Related papers (2024-07-09T09:06:44Z) - A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions.
At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.
Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.
We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Multi-Resolution Online Deterministic Annealing: A Hierarchical and
Progressive Learning Architecture [0.0]
We introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space.
We show that the solution of each optimization problem can be estimated online using gradient-free approximation updates.
Asymptotic convergence analysis and experimental results are provided for supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-12-15T23:21:49Z) - Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon
Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks.
For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills.
We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z) - Context-Specific Representation Abstraction for Deep Option Learning [43.68681795014662]
We introduce Context-Specific Representation Abstraction for Deep Option Learning (CRADOL)
CRADOL is a new framework that considers both temporal abstraction and context-specific representation abstraction to effectively reduce the size of the search over policy space.
Specifically, our method learns a factored belief state representation that enables each option to learn a policy over only a subsection of the state space.
arXiv Detail & Related papers (2021-09-20T22:50:01Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Hierarchical Representation Learning for Markov Decision Processes [9.904746542801837]
We present a novel method for learning hierarchical representations of Markov decision processes.
Our method works by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions.
We empirically validate the method, by showing that it can successfully learn a useful hierarchical representation in a navigation domain.
arXiv Detail & Related papers (2021-06-03T07:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.