Related papers: PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models

PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models

URL: http://arxiv.org/abs/2601.19917v1
Date: Wed, 07 Jan 2026 12:38:56 GMT
Title: PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models
Authors: Haoyu Zheng, Yun Zhu, Yuqian Yuan, Bo Yuan, Wenqiao Zhang, Siliang Tang, Jun Xiao,
Abstract summary: Large Language Models (LLMs) often lack the capacity to formulate global strategies, leading to error propagation in long-horizon tasks.<n>We propose PILOT, a framework designed to internalize the strategic oversight of large models into intrinsic Latent Guidance.
Score: 51.43746425777865
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Strategic planning is critical for multi-step reasoning, yet compact Large Language Models (LLMs) often lack the capacity to formulate global strategies, leading to error propagation in long-horizon tasks. Our analysis reveals that LLMs possess latent reasoning capabilities that can be unlocked when conditioned on explicit plans from a teacher model; however, runtime reliance on external guidance is often impractical due to latency and availability constraints. To bridge this gap, we propose PILOT (Planning via Internalized Latent Optimization Trajectories), a non-invasive framework designed to internalize the strategic oversight of large models into intrinsic Latent Guidance. Instead of altering backbone weights, PILOT employs a lightweight Hyper-Network to synthesize a query-conditioned Latent Guidance vector. This vector acts as an internal steering mechanism, guiding the model's representations toward optimal reasoning paths. Extensive experiments on mathematical and coding benchmarks demonstrate that PILOT effectively stabilizes reasoning trajectories, consistently outperforming strong baselines (e.g., +8.9% on MATH500) with negligible inference latency.

Related papers

EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs [0.36050743818632486]
Large language models (LLMs) for reasoning tasks on edge GPU face critical challenges from strict latency constraints and limited computational resources.<n>To navigate these constraints, developers must balance reasoning versus non-reasoning architectures, selecting appropriate model sizes, allocating token budgets, and applying test-time scaling strategies.<n>We present EdgeReasoning, a comprehensive study characterizing the deployment of reasoning LLMs on edge GPUs.
arXiv Detail & Related papers (2025-10-21T04:18:25Z)
NoTVLA: Narrowing of Dense Action Trajectories for Generalizable Robot Manipulation [54.87964060934928]
Vision-Language-Action (VLA) models confront critical barriers to real-world deployment, most notably catastrophic forgetting.<n>We propose the Narrowing of Trajectory VLA framework: a novel approach that narrows its focus to sparse trajectories.<n>NoTVLA achieves superior performance and generalization compared to pi0 while operating under two critical constraints.
arXiv Detail & Related papers (2025-10-04T18:26:55Z)
HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds [0.0]
Large language models (LLMs) have shown remarkable capabilities in isolated step-by-step reasoning tasks such as mathematics and programming.<n>But their proficiency in long-horizon planning, where solutions require extended, structured sequences of interdependent actions, remains underexplored.<n>We introduce HeroBench, a novel benchmark designed specifically to evaluate long-horizon planning and structured reasoning within complex RPG-inspired virtual worlds.
arXiv Detail & Related papers (2025-08-18T09:59:02Z)
Reinforced Reasoning for Embodied Planning [18.40186665383579]
Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals.<n>We introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning.
arXiv Detail & Related papers (2025-05-28T07:21:37Z)
Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration [15.711365331854614]
We introduce Dynamic Adaptation of Reasoning Trajectories (DART), a novel data adaptation framework.<n>Instead of uniformly imitating expert steps, DART employs a selective imitation strategy guided by step-wise adaptability estimation.<n>We validate DART across multiple reasoning benchmarks and model scales, demonstrating that it significantly improves generalization and data efficiency.
arXiv Detail & Related papers (2025-05-27T04:08:11Z)
Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.<n>By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.<n>On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z)
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
We propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories. Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework.
arXiv Detail & Related papers (2024-02-01T15:18:33Z)
LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner. Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z)
Large Trajectory Models are Scalable Motion Predictors and Planners [25.03447801499]
Motion prediction and planning are vital tasks in autonomous driving. We introduce a scalable trajectory model called State Transformer (STR) STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task.
arXiv Detail & Related papers (2023-10-30T15:12:41Z)
Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network [5.000272778136268]
This study shows that the predictive coding (PC) and active inference (AIF) frameworks can develop better generalization by learning a prior distribution in a low dimensional latent state space. In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound. Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data.
arXiv Detail & Related papers (2020-05-27T06:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.