Adversarially Guided Subgoal Generation for Hierarchical Reinforcement
Learning
- URL: http://arxiv.org/abs/2201.09635v1
- Date: Mon, 24 Jan 2022 12:30:38 GMT
- Title: Adversarially Guided Subgoal Generation for Hierarchical Reinforcement
Learning
- Authors: Vivienne Huiling Wang, Joni Pajarinen, Tinghuai Wang, Joni
K\"am\"ar\"ainen
- Abstract summary: We propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy.
Experiments with state-of-the-art algorithms show that our approach significantly improves learning efficiency and overall performance of HRL in various challenging continuous control tasks.
- Score: 5.514236598436977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks
by performing decision-making and control at successively higher levels of
temporal abstraction. However, off-policy training in HRL often suffers from
the problem of non-stationary high-level decision making since the low-level
policy is constantly changing. In this paper, we propose a novel HRL approach
for mitigating the non-stationarity by adversarially enforcing the high-level
policy to generate subgoals compatible with the current instantiation of the
low-level policy. In practice, the adversarial learning can be implemented by
training a simple discriminator network concurrently with the high-level policy
which determines the compatibility level of subgoals. Experiments with
state-of-the-art algorithms show that our approach significantly improves
learning efficiency and overall performance of HRL in various challenging
continuous control tasks.
Related papers
- Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL)
HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks.
Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions.
At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z) - Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF [82.73541793388]
We introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation.
We provide theoretical studies of the problem landscape and its penalty-based gradient (policy) algorithms.
We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg Markov game, RL from human feedback and incentive design.
arXiv Detail & Related papers (2024-02-10T04:54:15Z) - Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL)
We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL.
We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z) - Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout [16.454305212398328]
We propose a goal-conditioned hierarchical reinforcement learning (HRL) framework named Guided Cooperation via Model-based Rollout (GCMR)
GCMR aims to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics.
Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of HIGL, namely ACLG, yields more stable and robust policy improvement.
arXiv Detail & Related papers (2023-09-24T00:13:16Z) - Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z) - Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS)
HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when.
Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z) - Adjacency constraint for efficient hierarchical reinforcement learning [25.15808501708926]
Goal-conditioned Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques.
HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large.
We show that this problem can be effectively alleviated by restricting the high-level action space to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2021-10-30T09:26:45Z) - Efficient Hierarchical Exploration with Stable Subgoal Representation
Learning [26.537055962523162]
We propose a state-specific regularization that stabilizes subgoal embeddings in well-explored areas.
We develop an efficient hierarchical exploration strategy that actively seeks out new promising subgoals and states.
arXiv Detail & Related papers (2021-05-31T07:28:59Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement
Learning [22.319208517053816]
Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning techniques.
HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large.
We show that a constraint on the action space can be effectively alleviated by restricting it to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2020-06-20T03:34:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.