Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes
- URL: http://arxiv.org/abs/2512.17846v1
- Date: Fri, 19 Dec 2025 17:49:13 GMT
- Title: Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes
- Authors: Carlos Vélez García, Miguel Cazorla, Jorge Pomares,
- Abstract summary: Planning as Descent (PaD) is a framework for offline goal-conditioned reinforcement learning.<n>PaD learns a goal-conditioned energy function over entire latent trajectories, assigning low energy to feasible, goal-consistent futures.<n>Our results suggest learning to evaluate and refine trajectories provides a robust alternative to direct policy learning.
- Score: 0.8703455323398351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Planning as Descent (PaD), a framework for offline goal-conditioned reinforcement learning that grounds trajectory synthesis in verification. Instead of learning a policy or explicit planner, PaD learns a goal-conditioned energy function over entire latent trajectories, assigning low energy to feasible, goal-consistent futures. Planning is realized as gradient-based refinement in this energy landscape, using identical computation during training and inference to reduce train-test mismatch common in decoupled modeling pipelines. PaD is trained via self-supervised hindsight goal relabeling, shaping the energy landscape around the planning dynamics. At inference, multiple trajectory candidates are refined under different temporal hypotheses, and low-energy plans balancing feasibility and efficiency are selected. We evaluate PaD on OGBench cube manipulation tasks. When trained on narrow expert demonstrations, PaD achieves state-of-the-art 95\% success, strongly outperforming prior methods that peak at 68\%. Remarkably, training on noisy, suboptimal data further improves success and plan efficiency, highlighting the benefits of verification-driven planning. Our results suggest learning to evaluate and refine trajectories provides a robust alternative to direct policy learning for offline, reward-free planning.
Related papers
- Improving Diffusion Planners by Self-Supervised Action Gating with Energies [31.430422680816907]
We propose Self-supervised Action Gating with Energies (SAGE) to penalise dynamically inconsistent plans using a latent consistency signal.<n>SAGE trains a Joint-Embedding Predictive Architecture (JEPA) encoder on offline state sequences and an action-conditioned latent predictor for short horizon transitions.<n>At test time, SAGE assigns each sampled candidate an energy given by its latent prediction error and combines this feasibility score with value estimates to select actions.
arXiv Detail & Related papers (2026-03-03T06:36:16Z) - Closing the Train-Test Gap in World Models for Gradient-Based Planning [64.36544881136405]
We propose improved methods for training world models that enable efficient gradient-based planning.<n>At test time, our approach outperforms or matches the classical gradient-free cross-entropy method.
arXiv Detail & Related papers (2025-12-10T18:59:45Z) - DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping [74.34061104176554]
We propose DeepPlanner, an end-to-end RL framework that effectively enhances the planning capabilities of deep research agents.<n>Our approach shapes token-level advantage with an entropy-based term to allocate larger updates to high entropy tokens, and selectively upweights sample-level advantages for planning-intensive rollouts.
arXiv Detail & Related papers (2025-10-14T20:47:05Z) - A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks [66.86312354478478]
Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks.<n>We introduce a plan-and-execute framework and propose a planner training method to enhance the executor agent's planning abilities without human effort.<n>Experiments show that executor agents equipped with our planner outperform existing methods, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2025-10-07T06:10:53Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference [53.419249906014194]
We study generative modeling for planning with datasets repurposed from offline reinforcement learning.<n>We introduce the Latent Plan Transformer (), a novel model that leverages a latent variable to connect a Transformer-based trajectory generator and the final return.
arXiv Detail & Related papers (2024-02-07T08:18:09Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Active Learning of Abstract Plan Feasibility [17.689758291966502]
We present an active learning approach to efficiently acquire an APF predictor through task-independent, curious exploration on a robot.
We leverage an infeasible subsequence property to prune candidate plans in the active learning strategy, allowing our system to learn from less data.
In a stacking domain where objects have non-uniform mass distributions, we show that our system permits real robot learning of an APF model in four hundred self-supervised interactions.
arXiv Detail & Related papers (2021-07-01T18:17:01Z) - PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.