Can Large Language Models be Good Path Planners? A Benchmark and
Investigation on Spatial-temporal Reasoning
- URL: http://arxiv.org/abs/2310.03249v2
- Date: Wed, 7 Feb 2024 20:18:54 GMT
- Title: Can Large Language Models be Good Path Planners? A Benchmark and
Investigation on Spatial-temporal Reasoning
- Authors: Mohamed Aghzal, Erion Plaku, Ziyu Yao
- Abstract summary: Large language models (LLMs) have achieved remarkable success across a wide spectrum of tasks.
We propose a new benchmark, termed $textbfP$ath $textbfP$lanning from $textbfN$atural $textbfL$anguage.
- Score: 10.633920029087676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have achieved remarkable success across a wide
spectrum of tasks; however, they still face limitations in scenarios that
demand long-term planning and spatial reasoning. To facilitate this line of
research, in this work, we propose a new benchmark, termed $\textbf{P}$ath
$\textbf{P}$lanning from $\textbf{N}$atural $\textbf{L}$anguage
($\textbf{PPNL}$). Our benchmark evaluates LLMs' spatial-temporal reasoning by
formulating ''path planning'' tasks that require an LLM to navigate to target
locations while avoiding obstacles and adhering to constraints. Leveraging this
benchmark, we systematically investigate LLMs including GPT-4 via different
few-shot prompting methodologies as well as BART and T5 of various sizes via
fine-tuning. Our experimental results show the promise of few-shot GPT-4 in
spatial reasoning, when it is prompted to reason and act interleavedly,
although it still fails to perform long-term temporal reasoning. In contrast,
while fine-tuned LLMs achieved impressive results on in-distribution reasoning
tasks, they struggled to generalize to larger environments or environments with
more obstacles.
Related papers
- FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Look Further Ahead: Testing the Limits of GPT-4 in Path Planning [9.461626534488117]
Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks.
Our proposed benchmark systematically tests path-planning skills in complex settings.
We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness.
arXiv Detail & Related papers (2024-06-17T18:12:56Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Can large language models explore in-context? [87.49311128190143]
We deploy Large Language Models as agents in simple multi-armed bandit environments.
We find that the models do not robustly engage in exploration without substantial interventions.
arXiv Detail & Related papers (2024-03-22T17:50:43Z) - LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs)
Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models.
We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic
Tabletop Manipulation [38.66406497318709]
This work focuses on the tabletop manipulation task and releases a simulation benchmark, textitLoHoRavens, which covers various long-horizon reasoning aspects spanning color, size, space, arithmetics and reference.
We investigate two methods of bridging the modality gap: caption generation and learnable interface for incorporating explicit and implicit observation feedback to the LLM.
arXiv Detail & Related papers (2023-10-18T14:53:14Z) - Reason for Future, Act for Now: A Principled Framework for Autonomous
LLM Agents with Provable Sample Efficiency [53.8779374188643]
We propose a principled framework with provable regret guarantees to orchestrate reasoning and acting.
Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon.
At each step, the LLM agent takes the initial action of the planned trajectory ("act for now"), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state.
arXiv Detail & Related papers (2023-09-29T16:36:39Z) - Reasoning with Language Model is Planning with World Model [27.24144881796878]
Large language models (LLMs) have shown remarkable reasoning capabilities.
LLMs lack an internal $textitworld model$ to predict the world.
We propose a new LLM reasoning framework, $underlineR$easoning vi$underlinea$ $underlineP$lanning $textbf(RAP)$.
arXiv Detail & Related papers (2023-05-24T10:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.