Related papers: HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents

HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents

URL: http://arxiv.org/abs/2508.14751v1
Date: Wed, 20 Aug 2025 14:50:28 GMT
Title: HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents
Authors: Thomas Carta, Clément Romac, Loris Gaven, Pierre-Yves Oudeyer, Olivier Sigaud, Sylvain Lamprier,
Abstract summary: HERAKLES is a framework that enables a two-level hierarchical autotelic agent to continuously compile mastered goals into a low-level policy.<n>We show that it scales effectively with goal complexity, improves sample efficiency through skill compilation, and enables the agent to adapt robustly to novel challenges over time.
Score: 29.437416274639165
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-ended AI agents need to be able to learn efficiently goals of increasing complexity, abstraction and heterogeneity over their lifetime. Beyond sampling efficiently their own goals, autotelic agents specifically need to be able to keep the growing complexity of goals under control, limiting the associated growth in sample and computational complexity. To adress this challenge, recent approaches have leveraged hierarchical reinforcement learning (HRL) and language, capitalizing on its compositional and combinatorial generalization capabilities to acquire temporally extended reusable behaviours. Existing approaches use expert defined spaces of subgoals over which they instantiate a hierarchy, and often assume pre-trained associated low-level policies. Such designs are inadequate in open-ended scenarios, where goal spaces naturally diversify across a broad spectrum of difficulties. We introduce HERAKLES, a framework that enables a two-level hierarchical autotelic agent to continuously compile mastered goals into the low-level policy, executed by a small, fast neural network, dynamically expanding the set of subgoals available to the high-level policy. We train a Large Language Model (LLM) to serve as the high-level controller, exploiting its strengths in goal decomposition and generalization to operate effectively over this evolving subgoal space. We evaluate HERAKLES in the open-ended Crafter environment and show that it scales effectively with goal complexity, improves sample efficiency through skill compilation, and enables the agent to adapt robustly to novel challenges over time.

Related papers

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents [19.63866851076813]
HiMAC is a hierarchical agentic RL framework that decomposes long-horizon decision-making into macro-level planning and micro-level execution.<n>Our results show that introducing structured hierarchy, rather than increasing model scale alone, is a key factor for enabling robust long-horizon agentic intelligence.
arXiv Detail & Related papers (2026-03-01T08:09:03Z)
Zero-Shot Instruction Following in RL via Structured LTL Representations [50.41415009303967]
We study instruction following in multi-task reinforcement learning, where an agent must zero-shot execute novel tasks not seen during training.<n>In this setting, linear temporal logic has recently been adopted as a powerful framework for specifying structured, temporally extended tasks.<n>While existing approaches successfully train generalist policies, they often struggle to effectively capture the rich logical and temporal structure inherent in specifications.
arXiv Detail & Related papers (2026-02-15T23:22:50Z)
SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z)
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents [56.625878022978945]
Large Language Models (LLMs) as autonomous agents are increasingly tasked with solving complex, long-horizon problems.<n>Direct Preference Optimization (DPO) provides a signal that is too coarse for precise credit assignment, while step-level DPO is often too myopic to capture the value of multi-step behaviors.<n>We introduce Hierarchical Preference Learning (HPL), a hierarchical framework that optimize LLM agents by leveraging preference signals at multiple, synergistic granularities.
arXiv Detail & Related papers (2025-09-26T08:43:39Z)
Reinforcement Learning with Anticipation: A Hierarchical Approach for Long-Horizon Tasks [3.79187263097166]
Solving long-horizon goal-conditioned tasks remains a significant challenge in reinforcement learning.<n>We introduce Reinforcement Learning with Anticipation (RLA), a principled and potentially scalable framework designed to address these limitations.<n>Key feature of RLA is the training of the anticipation model, which is guided by a principle of value geometric consistency.
arXiv Detail & Related papers (2025-09-06T00:10:15Z)
Generative World Models of Tasks: LLM-Driven Hierarchical Scaffolding for Embodied Agents [0.0]
We propose an effective world model for decision-making that models the world's physics and its task semantics.<n>A systematic review of 2024 research in low-resource multi-agent soccer reveals a clear trend towards integrating symbolic and hierarchical methods.<n>We formalize this trend into a framework for Hierarchical Task Environments (HTEs), which are essential for bridging the gap between simple, reactive behaviors and sophisticated, strategic team play.
arXiv Detail & Related papers (2025-09-05T01:03:51Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning [32.260964481673085]
Large language models (LLMs) struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment.<n>We propose an innovative framework that introduces a parameter-efficient and generally applicable hierarchy to LLM policies.<n>We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy.
arXiv Detail & Related papers (2025-05-26T09:43:40Z)
Multi-Agent Collaboration via Evolving Orchestration [61.93162413517026]
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving.<n>We propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a central orchestrator dynamically directs agents in response to evolving task states.<n> Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs.
arXiv Detail & Related papers (2025-05-26T07:02:17Z)
Learning with Expert Abstractions for Efficient Multi-Task Continuous Control [5.796482272333648]
Decision-making in continuous multi-task environments is often hindered by the difficulty of obtaining accurate models for planning and the inefficiency of learning purely from trial and error.<n>We propose a hierarchical reinforcement learning approach that addresses these limitations by dynamically planning over the expert-specified abstraction to generate subgoals to learn a goal-conditioned policy.<n>Our empirical evaluation on a suite of procedurally generated continuous control environments demonstrates that our approach outperforms existing hierarchical reinforcement learning methods in terms of sample efficiency, task completion rate, scalability to complex tasks, and generalization to novel scenarios.
arXiv Detail & Related papers (2025-03-19T00:44:23Z)
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint [36.970138281579686]
Hierarchical reinforcement learning (HRL) uses a hierarchical framework that divides tasks into subgoals and completes them sequentially.<n>Current methods struggle to find suitable subgoals for ensuring a stable learning process.<n>We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints.
arXiv Detail & Related papers (2024-02-22T03:11:09Z)
Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA) SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning. SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control [44.77500987121531]
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations. We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning. We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
arXiv Detail & Related papers (2022-10-05T00:28:26Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.