Related papers: Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes

Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes

URL: http://arxiv.org/abs/2106.15380v1
Date: Tue, 29 Jun 2021 13:10:08 GMT
Title: Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes
Authors: Guillermo Infante, Anders Jonsso, Vicen\c{c} G\'omez
Abstract summary: We present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.

Related papers

Hierarchical Learning-based Graph Partition for Large-scale Vehicle Routing Problems [19.54367116789867]
We propose a versatile Hierarchical Learning-based Graph Partition (HLGP) framework to benefit the partition of CVRP instances. HLGP is tailored to benefit the partition of CVRP instances by synergistically integrating global and local partition policies.
arXiv Detail & Related papers (2025-02-12T12:07:09Z)
Hierarchical Average-Reward Linearly-solvable Markov Decision Processes [11.69049916139847]
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes. Our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
arXiv Detail & Related papers (2024-07-09T09:06:44Z)
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable. It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z)
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z)
Multi-Resolution Online Deterministic Annealing: A Hierarchical and Progressive Learning Architecture [0.0]
We introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space. We show that the solution of each optimization problem can be estimated online using gradient-free approximation updates. Asymptotic convergence analysis and experimental results are provided for supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-12-15T23:21:49Z)
Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks. For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills. We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z)
Context-Specific Representation Abstraction for Deep Option Learning [43.68681795014662]
We introduce Context-Specific Representation Abstraction for Deep Option Learning (CRADOL) CRADOL is a new framework that considers both temporal abstraction and context-specific representation abstraction to effectively reduce the size of the search over policy space. Specifically, our method learns a factored belief state representation that enables each option to learn a policy over only a subsection of the state space.
arXiv Detail & Related papers (2021-09-20T22:50:01Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Hierarchical Representation Learning for Markov Decision Processes [9.904746542801837]
We present a novel method for learning hierarchical representations of Markov decision processes. Our method works by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions. We empirically validate the method, by showing that it can successfully learn a useful hierarchical representation in a navigation domain.
arXiv Detail & Related papers (2021-06-03T07:53:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.