Reasoning with Language Model is Planning with World Model
- URL: http://arxiv.org/abs/2305.14992v2
- Date: Mon, 23 Oct 2023 07:24:28 GMT
- Title: Reasoning with Language Model is Planning with World Model
- Authors: Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe
Wang, Zhiting Hu
- Abstract summary: Large language models (LLMs) have shown remarkable reasoning capabilities.
LLMs lack an internal $textitworld model$ to predict the world.
We propose a new LLM reasoning framework, $underlineR$easoning vi$underlinea$ $underlineP$lanning $textbf(RAP)$.
- Score: 27.24144881796878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have shown remarkable reasoning capabilities,
especially when prompted to generate intermediate reasoning steps (e.g.,
Chain-of-Thought, CoT). However, LLMs can still struggle with problems that are
easy for humans, such as generating action plans for executing tasks in a given
environment, or performing complex math, logical, and commonsense reasoning.
The deficiency stems from the key fact that LLMs lack an internal
$\textit{world model}$ to predict the world $\textit{state}$ (e.g., environment
status, intermediate variable values) and simulate long-term outcomes of
actions. This prevents LLMs from performing deliberate planning akin to human
brains, which involves exploring alternative reasoning paths, anticipating
future states and rewards, and iteratively refining existing reasoning steps.
To overcome the limitations, we propose a new LLM reasoning framework,
$\underline{R}$easoning vi$\underline{a}$ $\underline{P}$lanning
$\textbf{(RAP)}$. RAP repurposes the LLM as both a world model and a reasoning
agent, and incorporates a principled planning algorithm (based on Monto Carlo
Tree Search) for strategic exploration in the vast reasoning space. During
reasoning, the LLM (as agent) incrementally builds a reasoning tree under the
guidance of the LLM (as world model) and task-specific rewards, and obtains a
high-reward reasoning path efficiently with a proper balance between
exploration $\textit{vs.}$ exploitation. We apply RAP to a variety of
challenging reasoning problems including plan generation, math reasoning, and
logical inference. Empirical results on these tasks demonstrate the superiority
of RAP over various strong baselines, including CoT and least-to-most prompting
with self-consistency. RAP on LLAMA-33B surpasses CoT on GPT-4 with 33%
relative improvement in a plan generation setting.
Related papers
- Language Agents Meet Causality -- Bridging LLMs and Causal World Models [50.79984529172807]
We propose a framework that integrates causal representation learning with large language models.
This framework learns a causal world model, with causal variables linked to natural language expressions.
We evaluate the framework on causal inference and planning tasks across temporal scales and environmental complexities.
arXiv Detail & Related papers (2024-10-25T18:36:37Z) - FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Can LLMs Reason in the Wild with Programs? [20.47557047823847]
We introduce the task of reasoning in the wild, where an LLM is tasked to solve a reasoning problem of unknown type.
We create a large tactic-guided trajectory dataset containing detailed solutions to a diverse set of reasoning problems.
In experiments, we highlight that existing LLMs fail significantly on problems with ambiguous and mixed scope.
arXiv Detail & Related papers (2024-06-19T18:26:19Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications.
This paper evaluates LLMs' reasoning abilities in competitive environments.
We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z) - Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof
Generation with Contrastive Stepwise Decoding [11.385103498440932]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.
Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z) - Can Large Language Models be Good Path Planners? A Benchmark and
Investigation on Spatial-temporal Reasoning [10.633920029087676]
Large language models (LLMs) have achieved remarkable success across a wide spectrum of tasks.
We propose a new benchmark, termed $textbfP$ath $textbfP$lanning from $textbfN$atural $textbfL$anguage.
arXiv Detail & Related papers (2023-10-05T01:42:16Z) - Reason for Future, Act for Now: A Principled Framework for Autonomous
LLM Agents with Provable Sample Efficiency [53.8779374188643]
We propose a principled framework with provable regret guarantees to orchestrate reasoning and acting.
Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon.
At each step, the LLM agent takes the initial action of the planned trajectory ("act for now"), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state.
arXiv Detail & Related papers (2023-09-29T16:36:39Z) - Furthest Reasoning with Plan Assessment: Stable Reasoning Path with
Retrieval-Augmented Large Language Models [10.04323204974924]
Multi-Hop Question Answering (MHQA) stands as a widely discussed category.
Existing methods employ Large Language Models (LLMs) to generate reasoning paths and plans.
We propose a novel pipeline for MHQA called Furthest-Reasoning-with-Plan-Assessment (FuRePA)
arXiv Detail & Related papers (2023-09-22T10:15:13Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.