Related papers: Adversarially Guided Subgoal Generation for Hierarchical Reinforcement Learning

Adversarially Guided Subgoal Generation for Hierarchical Reinforcement Learning

URL: http://arxiv.org/abs/2201.09635v1
Date: Mon, 24 Jan 2022 12:30:38 GMT
Title: Adversarially Guided Subgoal Generation for Hierarchical Reinforcement Learning
Authors: Vivienne Huiling Wang, Joni Pajarinen, Tinghuai Wang, Joni K\"am\"ar\"ainen
Abstract summary: We propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. Experiments with state-of-the-art algorithms show that our approach significantly improves learning efficiency and overall performance of HRL in various challenging continuous control tasks.
Score: 5.514236598436977
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy training in HRL often suffers from the problem of non-stationary high-level decision making since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning can be implemented by training a simple discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Experiments with state-of-the-art algorithms show that our approach significantly improves learning efficiency and overall performance of HRL in various challenging continuous control tasks.

Related papers

Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals [12.894271401094615]
A key challenge in HRL is that the low-level policy changes over time, making it difficult for the high-level policy to generate effective subgoals.<n>We propose an approach that trains a conditional diffusion model regularized by a Gaussian Process (GP) prior to generate a complex variety of subgoals.<n>Building on this framework, we develop a strategy that selects subgoals from both the diffusion policy and GP's predictive mean.
arXiv Detail & Related papers (2025-05-27T20:38:44Z)
Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z)
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z)
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF [82.73541793388]
We introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation. We provide theoretical studies of the problem landscape and its penalty-based gradient (policy) algorithms. We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg Markov game, RL from human feedback and incentive design.
arXiv Detail & Related papers (2024-02-10T04:54:15Z)
Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL) We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z)
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout [16.454305212398328]
We propose a goal-conditioned hierarchical reinforcement learning (HRL) framework named Guided Cooperation via Model-based Rollout (GCMR) GCMR aims to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of HIGL, namely ACLG, yields more stable and robust policy improvement.
arXiv Detail & Related papers (2023-09-24T00:13:16Z)
Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints. We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z)
Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS) HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when. Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z)
Adjacency constraint for efficient hierarchical reinforcement learning [25.15808501708926]
Goal-conditioned Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. We show that this problem can be effectively alleviated by restricting the high-level action space to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2021-10-30T09:26:45Z)
Efficient Hierarchical Exploration with Stable Subgoal Representation Learning [26.537055962523162]
We propose a state-specific regularization that stabilizes subgoal embeddings in well-explored areas. We develop an efficient hierarchical exploration strategy that actively seeks out new promising subgoals and states.
arXiv Detail & Related papers (2021-05-31T07:28:59Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning [22.319208517053816]
Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning techniques. HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. We show that a constraint on the action space can be effectively alleviated by restricting it to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2020-06-20T03:34:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.