Hierarchical Average-Reward Linearly-solvable Markov Decision Processes
- URL: http://arxiv.org/abs/2407.06690v1
- Date: Tue, 9 Jul 2024 09:06:44 GMT
- Title: Hierarchical Average-Reward Linearly-solvable Markov Decision Processes
- Authors: Guillermo Infante, Anders Jonsson, Vicenç Gómez,
- Abstract summary: We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes.
Our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks.
Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
- Score: 11.69049916139847
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create smaller subtasks that are easier to solve, and the equivalence between such partitions to learn more efficiently. We then exploit the compositionality of low-level tasks to exactly represent the value function of the high-level task. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.
Related papers
- A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions.
At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z) - Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS)
HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when.
Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z) - Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon
Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks.
For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills.
We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Globally Optimal Hierarchical Reinforcement Learning for
Linearly-Solvable Markov Decision Processes [0.0]
We present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes.
We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition.
arXiv Detail & Related papers (2021-06-29T13:10:08Z) - Hierarchical Representation Learning for Markov Decision Processes [9.904746542801837]
We present a novel method for learning hierarchical representations of Markov decision processes.
Our method works by partitioning the state space into subsets, and defines subtasks for performing transitions between the partitions.
We empirically validate the method, by showing that it can successfully learn a useful hierarchical representation in a navigation domain.
arXiv Detail & Related papers (2021-06-03T07:53:18Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.