Look Further Ahead: Testing the Limits of GPT-4 in Path Planning
- URL: http://arxiv.org/abs/2406.12000v2
- Date: Thu, 20 Jun 2024 19:53:52 GMT
- Title: Look Further Ahead: Testing the Limits of GPT-4 in Path Planning
- Authors: Mohamed Aghzal, Erion Plaku, Ziyu Yao,
- Abstract summary: Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks.
Our proposed benchmark systematically tests path-planning skills in complex settings.
We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness.
- Score: 9.461626534488117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks. However, they still face challenges with long-horizon planning. To study this, we propose path planning tasks as a platform to evaluate LLMs' ability to navigate long trajectories under geometric constraints. Our proposed benchmark systematically tests path-planning skills in complex settings. Using this, we examined GPT-4's planning abilities using various task representations and prompting approaches. We found that framing prompts as Python code and decomposing long trajectory tasks improve GPT-4's path planning effectiveness. However, while these approaches show some promise toward improving the planning ability of the model, they do not obtain optimal paths and fail at generalizing over extended horizons.
Related papers
- QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds [51.05639500325598]
QuadrupedGPT is a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet.
Our agent processes human command and environmental contexts using a large multimodal model (LMM)
It is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals.
arXiv Detail & Related papers (2024-06-24T12:14:24Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [50.27313829438866]
Plan-Seq-Learn (PSL) is a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control.
PSL achieves success rates of over 85%, out-performing language-based, classical, and end-to-end approaches.
arXiv Detail & Related papers (2024-05-02T17:59:31Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - Can Large Language Models be Good Path Planners? A Benchmark and
Investigation on Spatial-temporal Reasoning [10.633920029087676]
Large language models (LLMs) have achieved remarkable success across a wide spectrum of tasks.
We propose a new benchmark, termed $textbfP$ath $textbfP$lanning from $textbfN$atural $textbfL$anguage.
arXiv Detail & Related papers (2023-10-05T01:42:16Z) - On the Planning, Search, and Memorization Capabilities of Large Language
Models [0.0]
We investigate the potential of the state-of-the-art large language model (GPT-4) for planning tasks.
We identify areas where large language models excel in solving planning problems and reveal the constraints that limit their applicability.
arXiv Detail & Related papers (2023-09-05T00:19:31Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2
into a Robot Language Model for Grounded Task Planning [45.51792981370957]
We investigate the applicability of a smaller class of large language models (LLMs) in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially.
Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans.
Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics.
arXiv Detail & Related papers (2023-05-12T18:14:32Z) - Hierarchies of Planning and Reinforcement Learning for Robot Navigation [22.08479169489373]
In many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available.
Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation.
This work proposes a novel hierarchical framework that utilizes a trainable planning policy for the HL representation.
arXiv Detail & Related papers (2021-09-23T07:18:15Z) - PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.