A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning
- URL: http://arxiv.org/abs/2406.15124v1
- Date: Fri, 21 Jun 2024 13:17:33 GMT
- Title: A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning
- Authors: Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli,
- Abstract summary: We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions.
At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
- Score: 54.20447310988282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical Reinforcement Learning (HRL) approaches have shown successful results in solving a large variety of complex, structured, long-horizon problems. Nevertheless, a full theoretical understanding of this empirical evidence is currently missing. In the context of the \emph{option} framework, prior research has devised efficient algorithms for scenarios where options are fixed, and the high-level policy selecting among options only has to be learned. However, the fully realistic scenario in which both the high-level and the low-level policies are learned is surprisingly disregarded from a theoretical perspective. This work makes a step towards the understanding of this latter scenario. Focusing on the finite-horizon problem, we present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy. The bounds derived are compared with the lower bound for non-hierarchical finite-horizon problems, allowing to characterize when a hierarchical approach is provably preferable, even without pre-trained options.
Related papers
- Hierarchical Average-Reward Linearly-solvable Markov Decision Processes [11.69049916139847]
We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes.
Our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks.
Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
arXiv Detail & Related papers (2024-07-09T09:06:44Z) - A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints [66.61399765513383]
We develop a BLOCC algorithm to tackle BiLevel Optimization problems with Coupled Constraints.
We demonstrate its effectiveness on two well-known real-world applications.
arXiv Detail & Related papers (2024-06-14T15:59:36Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - An Option-Dependent Analysis of Regret Minimization Algorithms in
Finite-Horizon Semi-Markov Decision Processes [47.037877670620524]
We present an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems.
We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure.
arXiv Detail & Related papers (2023-05-10T15:00:05Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.
Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.
We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics [5.270497591225775]
In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward.
Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space.
We propose a novel algorithm for constrained RL that does not suffer from these limitations.
arXiv Detail & Related papers (2022-12-03T01:54:55Z) - A policy gradient approach for Finite Horizon Constrained Markov Decision Processes [6.682382456607199]
We present an algorithm for constrained RL in the Finite Horizon Setting where the horizon terminates after a fixed (finite) time.
To the best of our knowledge, our paper presents the first policy gradient algorithm for the finite horizon setting with constraints.
arXiv Detail & Related papers (2022-10-10T09:52:02Z) - Provable Hierarchical Imitation Learning via EM [2.864550757598007]
We consider learning an options-type hierarchical policy from expert demonstrations.
We characterize the EM approach proposed by Daniel et al.
We prove that the proposed algorithm converges with high probability to a norm ball around the true parameter.
arXiv Detail & Related papers (2020-10-07T03:21:57Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.