Related papers: A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning

A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning

URL: http://arxiv.org/abs/2406.15124v1
Date: Fri, 21 Jun 2024 13:17:33 GMT
Title: A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning
Authors: Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli,
Abstract summary: We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
Score: 54.20447310988282
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hierarchical Reinforcement Learning (HRL) approaches have shown successful results in solving a large variety of complex, structured, long-horizon problems. Nevertheless, a full theoretical understanding of this empirical evidence is currently missing. In the context of the \emph{option} framework, prior research has devised efficient algorithms for scenarios where options are fixed, and the high-level policy selecting among options only has to be learned. However, the fully realistic scenario in which both the high-level and the low-level policies are learned is surprisingly disregarded from a theoretical perspective. This work makes a step towards the understanding of this latter scenario. Focusing on the finite-horizon problem, we present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy. The bounds derived are compared with the lower bound for non-hierarchical finite-horizon problems, allowing to characterize when a hierarchical approach is provably preferable, even without pre-trained options.

Related papers

Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning [66.4260157478436]
We study reinforcement learning in the policy learning setting.<n>The goal is to find a policy whose performance is competitive with the best policy in a given class of interest.
arXiv Detail & Related papers (2025-07-06T14:40:05Z)
Hierarchical Average-Reward Linearly-solvable Markov Decision Processes [11.69049916139847]
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes. Our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
arXiv Detail & Related papers (2024-07-09T09:06:44Z)
A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints [66.61399765513383]
We develop a BLOCC algorithm to tackle BiLevel Optimization problems with Coupled Constraints. We demonstrate its effectiveness on two well-known real-world applications.
arXiv Detail & Related papers (2024-06-14T15:59:36Z)
Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable. It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z)
An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes [47.037877670620524]
We present an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems. We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure.
arXiv Detail & Related papers (2023-05-10T15:00:05Z)
On Penalty-based Bilevel Gradient Descent Method [35.83102074785861]
Bilevel optimization enjoys a wide range of applications in emerging machine learning and signal processing problems. Recent progress on bilevel algorithms mainly focuses on bilevel optimization problems through the lens of the implicit-gradient method. In this work, we tackle a challenging class of bilevel problems through the lens of the penalty method.
arXiv Detail & Related papers (2023-02-10T11:30:19Z)
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z)
Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics [5.270497591225775]
In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. We propose a novel algorithm for constrained RL that does not suffer from these limitations.
arXiv Detail & Related papers (2022-12-03T01:54:55Z)
A policy gradient approach for Finite Horizon Constrained Markov Decision Processes [6.682382456607199]
We present an algorithm for constrained RL in the Finite Horizon Setting where the horizon terminates after a fixed (finite) time. To the best of our knowledge, our paper presents the first policy gradient algorithm for the finite horizon setting with constraints.
arXiv Detail & Related papers (2022-10-10T09:52:02Z)
Provable Hierarchical Imitation Learning via EM [2.864550757598007]
We consider learning an options-type hierarchical policy from expert demonstrations. We characterize the EM approach proposed by Daniel et al. We prove that the proposed algorithm converges with high probability to a norm ball around the true parameter.
arXiv Detail & Related papers (2020-10-07T03:21:57Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.