Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies
- URL: http://arxiv.org/abs/2406.18053v1
- Date: Wed, 26 Jun 2024 04:05:04 GMT
- Title: Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies
- Authors: Yu Luo, Fuchun Sun, Tianying Ji, Xianyuan Zhan,
- Abstract summary: Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by decomposing them into subgoals.
We propose the Bidirectional-reachable Hierarchical Policy Optimization (BrHPO), a simple yet effective algorithm that also enjoys computation efficiency.
Experiment results on a variety of long-horizon tasks showcase that BrHPO outperforms other state-of-the-art HRL baselines, coupled with a significantly higher exploration efficiency and robustness.
- Score: 26.915223518488016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals. Therefore, the effectiveness of HRL is greatly influenced by subgoal reachability. Typical HRL methods only consider subgoal reachability from the unilateral level, where a dominant level enforces compliance to the subordinate level. However, we observe that when the dominant level becomes trapped in local exploration or generates unattainable subgoals, the subordinate level is negatively affected and cannot follow the dominant level's actions. This can potentially make both levels stuck in local optima, ultimately hindering subsequent subgoal reachability. Allowing real-time bilateral information sharing and error correction would be a natural cure for this issue, which motivates us to propose a mutual response mechanism. Based on this, we propose the Bidirectional-reachable Hierarchical Policy Optimization (BrHPO)--a simple yet effective algorithm that also enjoys computation efficiency. Experiment results on a variety of long-horizon tasks showcase that BrHPO outperforms other state-of-the-art HRL baselines, coupled with a significantly higher exploration efficiency and robustness.
Related papers
- Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL)
HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks.
Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF [82.73541793388]
We introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation.
We provide theoretical studies of the problem landscape and its penalty-based gradient (policy) algorithms.
We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg Markov game, RL from human feedback and incentive design.
arXiv Detail & Related papers (2024-02-10T04:54:15Z) - Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout [16.454305212398328]
We propose a goal-conditioned hierarchical reinforcement learning (HRL) framework named Guided Cooperation via Model-based Rollout (GCMR)
GCMR aims to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics.
Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of HIGL, namely ACLG, yields more stable and robust policy improvement.
arXiv Detail & Related papers (2023-09-24T00:13:16Z) - Emergency action termination for immediate reaction in hierarchical
reinforcement learning [8.637919344171255]
We propose a method in which the validity of higher-level actions (thus lower-level goals) is constantly verified at the higher level.
If the actions, i.e. lower level goals, become inadequate, they are replaced by more appropriate ones.
This way we combine the advantages of hierarchical RL, which is fast training, and flat RL, which is immediate reactivity.
arXiv Detail & Related papers (2022-11-11T16:56:02Z) - Adversarially Guided Subgoal Generation for Hierarchical Reinforcement
Learning [5.514236598436977]
We propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy.
Experiments with state-of-the-art algorithms show that our approach significantly improves learning efficiency and overall performance of HRL in various challenging continuous control tasks.
arXiv Detail & Related papers (2022-01-24T12:30:38Z) - Adjacency constraint for efficient hierarchical reinforcement learning [25.15808501708926]
Goal-conditioned Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques.
HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large.
We show that this problem can be effectively alleviated by restricting the high-level action space to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2021-10-30T09:26:45Z) - Landmark-Guided Subgoal Generation in Hierarchical Reinforcement
Learning [64.97599673479678]
We present HIerarchical reinforcement learning Guided by Landmarks (HIGL)
HIGL is a novel framework for training a high-level policy with a reduced action space guided by landmarks.
Our experiments demonstrate that our framework outperforms prior-arts across a variety of control tasks.
arXiv Detail & Related papers (2021-10-26T12:16:19Z) - Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task.
We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy.
Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement
Learning [22.319208517053816]
Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning techniques.
HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large.
We show that a constraint on the action space can be effectively alleviated by restricting it to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2020-06-20T03:34:45Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.