Related papers: Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion

Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion

URL: http://arxiv.org/abs/2602.02722v1
Date: Mon, 02 Feb 2026 19:40:54 GMT
Title: Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
Authors: Dan Haramati, Carl Qi, Tal Daniel, Amy Zhang, Aviv Tamar, George Konidaris,
Abstract summary: We propose a hierarchical entity-centric framework for offline Goal-Conditioned Reinforcement Learning (GCRL)<n>This framework combines subgoal decomposition with factored structure to solve long-horizon tasks in domains with multiple entities.<n>We show that our method consistently boosts performance of the underlying RL agent on image-based long-horizon tasks with sparse rewards.
Score: 36.28452252200851
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a hierarchical entity-centric framework for offline Goal-Conditioned Reinforcement Learning (GCRL) that combines subgoal decomposition with factored structure to solve long-horizon tasks in domains with multiple entities. Achieving long-horizon goals in complex environments remains a core challenge in Reinforcement Learning (RL). Domains with multiple entities are particularly difficult due to their combinatorial complexity. GCRL facilitates generalization across goals and the use of subgoal structure, but struggles with high-dimensional observations and combinatorial state-spaces, especially under sparse reward. We employ a two-level hierarchy composed of a value-based GCRL agent and a factored subgoal-generating conditional diffusion model. The RL agent and subgoal generator are trained independently and composed post hoc through selective subgoal generation based on the value function, making the approach modular and compatible with existing GCRL algorithms. We introduce new variations to benchmark tasks that highlight the challenges of multi-entity domains, and show that our method consistently boosts performance of the underlying RL agent on image-based long-horizon tasks with sparse rewards, achieving over 150% higher success rates on the hardest task in our suite and generalizing to increasing horizons and numbers of entities. Rollout videos are provided at: https://sites.google.com/view/hecrl

Related papers

Integrating Diverse Assignment Strategies into DETRs [61.61489761918158]
Label assignment is a critical component in object detectors, particularly within DETR-style frameworks.<n>We propose LoRA-DETR, a flexible and lightweight framework that seamlessly integrates diverse assignment strategies into any DETR-style detector.
arXiv Detail & Related papers (2026-01-14T07:28:54Z)
Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation [8.7216199131049]
HeRD is a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation.<n>We employ a high-level reinforcement learning agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them.<n>Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation.
arXiv Detail & Related papers (2025-12-10T21:40:22Z)
HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents [29.437416274639165]
HERAKLES is a framework that enables a two-level hierarchical autotelic agent to continuously compile mastered goals into a low-level policy.<n>We show that it scales effectively with goal complexity, improves sample efficiency through skill compilation, and enables the agent to adapt robustly to novel challenges over time.
arXiv Detail & Related papers (2025-08-20T14:50:28Z)
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning [32.260964481673085]
Large language models (LLMs) struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment.<n>We propose an innovative framework that introduces a parameter-efficient and generally applicable hierarchy to LLM policies.<n>We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy.
arXiv Detail & Related papers (2025-05-26T09:43:40Z)
Flattening Hierarchies with Policy Bootstrapping [5.528896840956629]
We introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling.<n>Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces.
arXiv Detail & Related papers (2025-05-20T23:31:30Z)
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.96848846966087]
Training large language models (LLMs) as interactive agents presents unique challenges.<n>While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored.<n>We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z)
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning [88.55095746156428]
Retrieval-augmented generation (RAG) is widely utilized to incorporate external knowledge into large language models.<n>A standard RAG pipeline consists of several components, such as query rewriting, document retrieval, document filtering, and answer generation.<n>We propose treating the complex RAG pipeline with multiple components as a multi-agent cooperative task, in which each component can be regarded as an RL agent.
arXiv Detail & Related papers (2025-01-25T14:24:50Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks. This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z)
Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task. We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z)
Room Clearance with Feudal Hierarchical Reinforcement Learning [2.867517731896504]
We introduce a new simulation environment, "it", designed as a tool to build scenarios that can drive RL research in a direction useful for military analysis. We focus on an abstracted and simplified room clearance scenario, where a team of blue agents have to make their way through a building and ensure that all rooms are cleared of enemy red agents. We implement a multi-agent version of feudal hierarchical RL that introduces a command hierarchy where a commander at the higher level sends orders to multiple agents at the lower level who simply have to learn to follow these orders. We find that breaking the task down in this way allows us to
arXiv Detail & Related papers (2021-05-24T15:05:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.