Computational Benefits of Intermediate Rewards for Hierarchical Planning
- URL: http://arxiv.org/abs/2107.03961v1
- Date: Thu, 8 Jul 2021 16:39:13 GMT
- Title: Computational Benefits of Intermediate Rewards for Hierarchical Planning
- Authors: Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma
- Abstract summary: We show that using intermediate rewards reduces the computational complexity in finding a successful policy but does not guarantee to find the shortest path.
We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and other popular deep RL algorithms.
- Score: 42.579256546135866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many hierarchical reinforcement learning (RL) applications have empirically
verified that incorporating prior knowledge in reward design improves
convergence speed and practical performance. We attempt to quantify the
computational benefits of hierarchical RL from a planning perspective under
assumptions about the intermediate state and intermediate rewards frequently
(but often implicitly) adopted in practice. Our approach reveals a trade-off
between computational complexity and the pursuit of the shortest path in
hierarchical planning: using intermediate rewards significantly reduces the
computational complexity in finding a successful policy but does not guarantee
to find the shortest path, whereas using sparse terminal rewards finds the
shortest path at a significantly higher computational cost. We also corroborate
our theoretical results with extensive experiments on the MiniGrid environments
using Q-learning and other popular deep RL algorithms.
Related papers
- Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning [25.84621883831624]
Hierarchical reinforcement learning has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration.
We present primitive enabled adaptive relabeling (PEAR)
We first perform adaptive relabeling on a few expert demonstrations to generate efficient subgoal supervision.
We then jointly optimize HRL agents by employing reinforcement learning (RL) and imitation learning (IL)
arXiv Detail & Related papers (2023-06-10T09:41:30Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Truncating Trajectories in Monte Carlo Reinforcement Learning [48.97155920826079]
In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal.
We propose an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths.
We show that an appropriate truncation of the trajectories can succeed in improving performance.
arXiv Detail & Related papers (2023-05-07T19:41:57Z) - Reinforcement Learning for Classical Planning: Viewing Heuristics as
Dense Reward Generators [54.6441336539206]
We propose to leverage domain-independent functions commonly used in the classical planning literature to improve the sample efficiency of RL.
These classicals act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals.
We demonstrate on several classical planning domains that using classical logics for RL allows for good sample efficiency compared to sparse-reward RL.
arXiv Detail & Related papers (2021-09-30T03:36:01Z) - Off-Policy Reinforcement Learning with Delayed Rewards [16.914712720033524]
In many real-world tasks, instant rewards are not readily accessible or defined immediately after the agent performs actions.
In this work, we first formally define the environment with delayed rewards and discuss the challenges raised due to the non-Markovian nature of such environments.
We introduce a general off-policy RL framework with a new Q-function formulation that can handle the delayed rewards with theoretical convergence guarantees.
arXiv Detail & Related papers (2021-06-22T15:19:48Z) - Learning Guidance Rewards with Trajectory-space Smoothing [22.456737935789103]
Long-term temporal credit assignment is an important challenge in deep reinforcement learning.
Existing policy-gradient and Q-learning algorithms rely on dense environmental rewards that provide rich short-term supervision.
Recent works have proposed algorithms to learn dense "guidance" rewards that could be used in place of the sparse or delayed environmental rewards.
arXiv Detail & Related papers (2020-10-23T23:55:06Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.