DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical
Reinforcement Learning
- URL: http://arxiv.org/abs/2210.05150v1
- Date: Tue, 11 Oct 2022 05:09:34 GMT
- Title: DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical
Reinforcement Learning
- Authors: Seungjae Lee, Jigang Kim, Inkyu Jang, H. Jin Kim
- Abstract summary: Hierarchical Reinforcement Learning (HRL) has made notable progress in complex control tasks by leveraging temporal abstraction.
Previous HRL algorithms often suffer from serious data inefficiency as environments get large.
We present a method of Decoupling Horizons Using a Graph in Hierarchical Reinforcement Learning (DHRL) which can alleviate this problem.
- Score: 26.973783464706447
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Hierarchical Reinforcement Learning (HRL) has made notable progress in
complex control tasks by leveraging temporal abstraction. However, previous HRL
algorithms often suffer from serious data inefficiency as environments get
large. The extended components, $i.e.$, goal space and length of episodes,
impose a burden on either one or both high-level and low-level policies since
both levels share the total horizon of the episode. In this paper, we present a
method of Decoupling Horizons Using a Graph in Hierarchical Reinforcement
Learning (DHRL) which can alleviate this problem by decoupling the horizons of
high-level and low-level policies and bridging the gap between the length of
both horizons using a graph. DHRL provides a freely stretchable high-level
action interval, which facilitates longer temporal abstraction and faster
training in complex tasks. Our method outperforms state-of-the-art HRL
algorithms in typical HRL environments. Moreover, DHRL achieves long and
complex locomotion and manipulation tasks.
Related papers
- Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation [8.7216199131049]
HeRD is a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation.<n>We employ a high-level reinforcement learning agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them.<n>Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation.
arXiv Detail & Related papers (2025-12-10T21:40:22Z) - h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning [22.930073904843212]
Large language models excel at short-horizon reasoning tasks, but performance drops as reasoning horizon lengths increase.<n>Existing approaches to combat this rely on inference-time scaffolding or costly step-level supervision.<n>We introduce a scalable method to bootstrap long-horizon reasoning capabilities using only existing, abundant short-horizon data.
arXiv Detail & Related papers (2025-10-08T17:58:41Z) - SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning [81.7764584515496]
Vision-Language-Action (VLA) models have emerged as a powerful paradigm for robotic manipulation.<n>These models face two fundamental challenges: scarcity and high cost of large-scale human-operated robotic trajectories.<n>We introduce SimpleVLA-RL, an efficient reinforcement learning framework tailored for VLA models.
arXiv Detail & Related papers (2025-09-11T17:59:17Z) - Horizon Reduction Makes RL Scalable [78.67071359991218]
We study the scalability of offline reinforcement learning (RL) algorithms.<n>We use datasets up to 1000x larger than typical offline RL datasets.<n>We show that horizon is the main cause behind the poor scaling of offline RL.
arXiv Detail & Related papers (2025-06-04T17:06:54Z) - StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs)
StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks.
Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z) - Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion [62.91968752955649]
This paper tackles a novel problem, extendable long-horizon planning-enabling agents to plan trajectories longer than those in training data without compounding errors.
We propose an augmentation method that iteratively generates longer trajectories by stitching shorter ones.
HM-Diffuser trains on these extended trajectories using a hierarchical structure, efficiently handling tasks across multiple temporal scales.
arXiv Detail & Related papers (2025-03-25T22:52:46Z) - HG2P: Hippocampus-inspired High-reward Graph and Model-Free Q-Gradient Penalty for Path Planning and Motion Control [12.49955844499153]
Goal-conditioned hierarchical reinforcement learning (HRL) decomposes complex reaching tasks into a sequence of simple subgoal-conditioned tasks.
This paper bridges the goal-conditioned HRL based on graph-based planning to brain mechanisms, proposing a hippocampus-striatum-like dual-controller hypothesis.
arXiv Detail & Related papers (2024-10-12T11:46:31Z) - Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies [26.915223518488016]
Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by decomposing them into subgoals.
We propose the Bidirectional-reachable Hierarchical Policy Optimization (BrHPO), a simple yet effective algorithm that also enjoys computation efficiency.
Experiment results on a variety of long-horizon tasks showcase that BrHPO outperforms other state-of-the-art HRL baselines, coupled with a significantly higher exploration efficiency and robustness.
arXiv Detail & Related papers (2024-06-26T04:05:04Z) - PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer [47.924941959320996]
We propose a hierarchical planner designed for offline RL called PlanDQ.
PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals.
At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals.
arXiv Detail & Related papers (2024-06-10T20:59:53Z) - Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity [4.917399520581689]
Bilevel reinforcement learning (RL) features intertwined two-level problems.
We characterize the inherent hyper-gradity of lowerlevel convexity.
We propose both model-based and model-free bilevel reinforcement learning algorithms.
arXiv Detail & Related papers (2024-05-30T05:24:20Z) - RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.
The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks.
This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z) - Hierarchical Reinforcement Learning for Power Network Topology Control [22.203574989348773]
Learning in high-dimensional action spaces is a key challenge in applying reinforcement learning to real-world systems.
In this paper, we study the possibility of controlling power networks using RL methods.
arXiv Detail & Related papers (2023-11-03T12:33:00Z) - Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch
Size [58.762959061522736]
We show that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude.
We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time.
arXiv Detail & Related papers (2022-11-20T21:48:25Z) - Learning to Solve Combinatorial Graph Partitioning Problems via
Efficient Exploration [72.15369769265398]
Experimentally, ECORD achieves a new SOTA for RL algorithms on the Maximum Cut problem.
Compared to the nearest competitor, ECORD reduces the optimality gap by up to 73%.
arXiv Detail & Related papers (2022-05-27T17:13:10Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task.
We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy.
Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z) - Hierarchical Reinforcement Learning with Optimal Level Synchronization
based on a Deep Generative Model [4.266866385061998]
One of the HRL issues is how to train each level policy with the optimal data collection from its experience.
We propose a novel HRL model supporting the optimal level synchronization using the off-policy correction technique with a deep generative model.
arXiv Detail & Related papers (2021-07-17T05:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.