Reinforcement Learning with Anticipation: A Hierarchical Approach for Long-Horizon Tasks
- URL: http://arxiv.org/abs/2509.05545v1
- Date: Sat, 06 Sep 2025 00:10:15 GMT
- Title: Reinforcement Learning with Anticipation: A Hierarchical Approach for Long-Horizon Tasks
- Authors: Yang Yu,
- Abstract summary: Solving long-horizon goal-conditioned tasks remains a significant challenge in reinforcement learning.<n>We introduce Reinforcement Learning with Anticipation (RLA), a principled and potentially scalable framework designed to address these limitations.<n>Key feature of RLA is the training of the anticipation model, which is guided by a principle of value geometric consistency.
- Score: 3.79187263097166
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Solving long-horizon goal-conditioned tasks remains a significant challenge in reinforcement learning (RL). Hierarchical reinforcement learning (HRL) addresses this by decomposing tasks into more manageable sub-tasks, but the automatic discovery of the hierarchy and the joint training of multi-level policies often suffer from instability and can lack theoretical guarantees. In this paper, we introduce Reinforcement Learning with Anticipation (RLA), a principled and potentially scalable framework designed to address these limitations. The RLA agent learns two synergistic models: a low-level, goal-conditioned policy that learns to reach specified subgoals, and a high-level anticipation model that functions as a planner, proposing intermediate subgoals on the optimal path to a final goal. The key feature of RLA is the training of the anticipation model, which is guided by a principle of value geometric consistency, regularized to prevent degenerate solutions. We present proofs that RLA approaches the globally optimal policy under various conditions, establishing a principled and convergent method for hierarchical planning and execution in long-horizon goal-conditioned tasks.
Related papers
- Zero-Shot Instruction Following in RL via Structured LTL Representations [50.41415009303967]
We study instruction following in multi-task reinforcement learning, where an agent must zero-shot execute novel tasks not seen during training.<n>In this setting, linear temporal logic has recently been adopted as a powerful framework for specifying structured, temporally extended tasks.<n>While existing approaches successfully train generalist policies, they often struggle to effectively capture the rich logical and temporal structure inherent in specifications.
arXiv Detail & Related papers (2026-02-15T23:22:50Z) - Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL [25.40364932514488]
We propose a novel framework that reformulates hierarchical decision-making as autoregressive sequence modeling.<n>CoGHP consistently outperforms strong offline baselines, demonstrating improved performance on long-horizon tasks.
arXiv Detail & Related papers (2026-02-03T11:11:03Z) - Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning [5.274804664403783]
Strict Subgoal Execution (SSE) is a graph-based hierarchical RL framework that enforces single-step subgoal reachability.<n>We show that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.
arXiv Detail & Related papers (2025-06-26T06:35:42Z) - Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals [12.894271401094615]
A key challenge in HRL is that the low-level policy changes over time, making it difficult for the high-level policy to generate effective subgoals.<n>We propose an approach that trains a conditional diffusion model regularized by a Gaussian Process (GP) prior to generate a complex variety of subgoals.<n>Building on this framework, we develop a strategy that selects subgoals from both the diffusion policy and GP's predictive mean.
arXiv Detail & Related papers (2025-05-27T20:38:44Z) - Direct Preference Optimization for Primitive-Enabled Hierarchical Reinforcement Learning [75.9729413703531]
DIPPER is a novel HRL framework that formulates hierarchical policy learning as a bi-level optimization problem.<n>We show that DIPPER achieves up to 40% improvement over state-of-the-art baselines in sparse reward scenarios.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement
Learning [22.319208517053816]
Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning techniques.
HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large.
We show that a constraint on the action space can be effectively alleviated by restricting it to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2020-06-20T03:34:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.