Large Language Models Can Take False First Steps at Inference-time Planning
- URL: http://arxiv.org/abs/2602.02991v1
- Date: Tue, 03 Feb 2026 01:54:55 GMT
- Title: Large Language Models Can Take False First Steps at Inference-time Planning
- Authors: Haijiang Yan, Jian-Qiao Zhu, Adam Sanborn,
- Abstract summary: Large language models (LLMs) have been shown to acquire sequence-level planning abilities during training.<n>Planing behavior exhibited at inference time often appears short-sighted and inconsistent with these capabilities.<n>We propose a Bayesian account for this gap by grounding planning behavior in the evolving generative context.
- Score: 2.6100783621884625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have been shown to acquire sequence-level planning abilities during training, yet their planning behavior exhibited at inference time often appears short-sighted and inconsistent with these capabilities. We propose a Bayesian account for this gap by grounding planning behavior in the evolving generative context: given the subtle differences between natural language and the language internalized by LLMs, accumulated self-generated context drives a planning-shift during inference and thereby creates the appearance of compromised planning behavior. We further validate the proposed model through two controlled experiments: a random-generation task demonstrating constrained planning under human prompts and increasing planning strength as self-generated context accumulates, and a Gaussian-sampling task showing reduced initial bias when conditioning on self-generated sequences. These findings provide a theoretical explanation along with empirical evidence for characterizing how LLMs plan ahead during inference.
Related papers
- iCLP: Large Language Model Reasoning with Implicit Cognition Latent Planning [28.763018368302117]
Large language models (LLMs) can perform reliable step-by-step reasoning during problem-solving.<n> generating accurate and effective textual plans remains challenging due to hallucinations.<n>We propose iCLP, a novel framework that enables LLMs to adaptively generate latent plans.
arXiv Detail & Related papers (2025-12-30T06:19:04Z) - Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification [17.948161564138033]
Reasoning Vision Language Action (VLA) models improve robotic instruction-following by generating step-by-step textual plans before low-level actions.<n>But even with a correct textual plan, the generated actions can still miss the intended outcomes in the plan, especially in out-of-distribution scenarios.<n>We formalize this phenomenon as a lack of embodied CoT faithfulness, and introduce a training-free, runtime policy steering method for reasoning-action alignment.
arXiv Detail & Related papers (2025-10-18T00:38:45Z) - Detecting and Characterizing Planning in Language Models [1.320426480090921]
We present formal and causally grounded criteria for detecting planning and operationalize them as a semi-automated annotation pipeline.<n>We apply this pipeline to both base and instruction-tuned Gemma-2-2B models on the MBPP code generation benchmark and a poem generation task.<n>Our findings show that planning is not universal: unlike Haiku, Gemma-2-2B solves the same poem generation task through improvisation, and on MBPP it switches between planning and improvisation across similar tasks and even successive token predictions.
arXiv Detail & Related papers (2025-08-25T14:59:46Z) - Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study [0.0]
Large Language Models have sparked interest in their potential for robotic task planning.<n>While these models demonstrate strong generative capabilities, their effectiveness in producing structured and executable plans remains uncertain.<n>This paper presents a systematic evaluation of a broad spectrum of current state of the art language models.
arXiv Detail & Related papers (2025-07-31T14:25:54Z) - Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.<n>By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.<n>On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - LLM-SAP: Large Language Models Situational Awareness Based Planning [0.0]
We employ a multi-agent reasoning framework to develop a methodology that anticipates and actively mitigates potential risks.
Our approach diverges from traditional automata theory by incorporating the complexity of human-centric interactions into the planning process.
arXiv Detail & Related papers (2023-12-26T17:19:09Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - PDPP: Projected Diffusion for Procedure Planning in Instructional Videos [18.984980596601513]
We study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal.<n>Previous works cast this as a sequence modeling problem and leverage either intermediate visual observations or language instructions as supervision.<n>To avoid intermediate supervision annotation and error accumulation caused by planning autoregressively, we propose a diffusion-based framework.
arXiv Detail & Related papers (2023-03-26T10:50:16Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.