Natural Language Planning via Coding and Inference Scaling
- URL: http://arxiv.org/abs/2505.13252v1
- Date: Mon, 19 May 2025 15:35:17 GMT
- Title: Natural Language Planning via Coding and Inference Scaling
- Authors: Rikhil Amonkar, Ronan Le Bras, Li Zhang,
- Abstract summary: We show that programming often but not always outperforms planning.<n>Our detailed error analysis also indicates a lack of robustness and efficiency in the generated code that hinders generalization.
- Score: 15.79089054416743
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Real-life textual planning tasks such as meeting scheduling have posed much challenge to LLMs especially when the complexity is high. While previous work primarily studied auto-regressive generation of plans with closed-source models, we systematically evaluate both closed- and open-source models, including those that scales output length with complexity during inference, in generating programs, which are executed to output the plan. We consider not only standard Python code, but also the code to a constraint satisfaction problem solver. Despite the algorithmic nature of the task, we show that programming often but not always outperforms planning. Our detailed error analysis also indicates a lack of robustness and efficiency in the generated code that hinders generalization.
Related papers
- Seemingly Simple Planning Problems are Computationally Challenging: The Countdown Game [26.665033202052257]
We propose a procedure for creating a planning benchmark centered around the game called Countdown.<n>We discuss how this problem meets many of the desiderata associated with an ideal benchmark for planning capabilities evaluation.<n>Our results show that, unlike other domains like 24 Game (a special case of Countdown), our proposed dynamic benchmark remains extremely challenging for existing LLM-based approaches.
arXiv Detail & Related papers (2025-08-04T21:01:03Z) - Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models.<n>DisCIPL uses a Planner model to generate a task-specific inference program.<n>Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z) - Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code [8.971234046933349]
Large language models (LLMs) fail to plan reliably, even when prompted with a detailed definition of a planning task.<n>We show how to use LLMs to generate correct plans, even for out-of-distribution tasks of increasing size.
arXiv Detail & Related papers (2025-03-24T15:50:20Z) - Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning.<n>This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph.<n>The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z) - Code Simulation as a Proxy for High-order Tasks in Large Language Models [6.71786454125056]
We collect pairs of naturalistic and synthetic reasoning tasks to assess the capabilities of Large Language Models (LLM)<n>We leverage common constructs in programming as the counterpart of the building blocks of naturalistic reasoning tasks.<n>Our contribution builds upon synthetically testing the reasoning capabilities of LLMs as a scalable complement to handcrafted human-annotated problems.
arXiv Detail & Related papers (2025-02-05T19:30:28Z) - Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making.
Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance.
We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z) - Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - Self-planning Code Generation with Large Language Models [31.992593966465545]
This paper proposes a self-planning code generation approach with large language models.
In the planning phase, the model plans out concise solution steps from the intent combined with few-shot prompting.
In the implementation phase, the model generates code step by step, guided by the preceding solution steps.
arXiv Detail & Related papers (2023-03-12T15:36:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.