Related papers: On the Planning, Search, and Memorization Capabilities of Large Language Models

On the Planning, Search, and Memorization Capabilities of Large Language Models

URL: http://arxiv.org/abs/2309.01868v1
Date: Tue, 5 Sep 2023 00:19:31 GMT
Title: On the Planning, Search, and Memorization Capabilities of Large Language Models
Authors: Yunhao Yang, Anshul Tomar
Abstract summary: We investigate the potential of the state-of-the-art large language model (GPT-4) for planning tasks. We identify areas where large language models excel in solving planning problems and reveal the constraints that limit their applicability.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid advancement of large language models, such as the Generative Pre-trained Transformer (GPT) series, has had significant implications across various disciplines. In this study, we investigate the potential of the state-of-the-art large language model (GPT-4) for planning tasks. We explore its effectiveness in multiple planning subfields, highlighting both its strengths and limitations. Through a comprehensive examination, we identify areas where large language models excel in solving planning problems and reveal the constraints that limit their applicability. Our empirical analysis focuses on GPT-4's performance in planning domain extraction, graph search path planning, and adversarial planning. We then propose a way of fine-tuning a domain-specific large language model to improve its Chain of Thought (CoT) capabilities for the above-mentioned tasks. The results provide valuable insights into the potential applications of large language models in the planning domain and pave the way for future research to overcome their limitations and expand their capabilities.

Related papers

Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study [0.0]
Large Language Models have sparked interest in their potential for robotic task planning.<n>While these models demonstrate strong generative capabilities, their effectiveness in producing structured and executable plans remains uncertain.<n>This paper presents a systematic evaluation of a broad spectrum of current state of the art language models.
arXiv Detail & Related papers (2025-07-31T14:25:54Z)
LLMs as Planning Modelers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models [24.230622369142193]
Large Language Models (LLMs) excel in various natural language tasks but often struggle with long-horizon planning problems. This limitation has drawn interest in integrating neuro-symbolic approaches within the Automated Planning (AP) and Natural Language Processing (NLP) communities.
arXiv Detail & Related papers (2025-03-22T03:35:44Z)
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability [59.72892401927283]
We evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks. Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints.
arXiv Detail & Related papers (2024-09-30T03:58:43Z)
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes. CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning [10.704716790096498]
Large language models (LLMs) have demonstrated impressive task-solving capabilities, achieved through either prompting techniques or system designs. This paper investigates the impact of fine-tuning on LLMs' planning capabilities. We propose the Maximum Diversity Fine-Tuning (MDFT) strategy to improve the sample efficiency of fine-tuning in the planning domain.
arXiv Detail & Related papers (2024-06-15T03:06:14Z)
Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z)
PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset [0.0]
We present PARADISE, an abductive reasoning task using Q&A format on practical procedural text sourced from wikiHow. It involves warning and tip inference tasks directly associated with goals, excluding intermediary steps, with the aim of testing the ability of the models to infer implicit knowledge of the plan solely from the given goal. Our experiments, utilizing fine-tuned language models and zero-shot prompting, reveal the effectiveness of task-specific small models over large language models in most scenarios.
arXiv Detail & Related papers (2024-03-05T18:01:59Z)
Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve Reliability [6.4111574364474215]
The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability. This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios.
arXiv Detail & Related papers (2024-01-15T18:01:59Z)
A Planning Ontology to Represent and Exploit Planning Knowledge for Performance Efficiency [6.87593454486392]
We consider the problem of automated planning, where the objective is to find a sequence of actions that will move an agent from an initial state of the world to a desired goal state. We hypothesize that given a large number of available planners and diverse planning domains; they carry essential information that can be leveraged to identify suitable planners and improve their performance for a domain.
arXiv Detail & Related papers (2023-07-25T14:51:07Z)
Explicit Planning Helps Language Models in Logical Reasoning [39.27163698914806]
We propose LEAP, a novel system that uses language models to perform multi-step logical reasoning. Explicit planning enables the system to make more informed reasoning decisions at each step. Our system significantly outperforms other competing methods on multiple standard datasets.
arXiv Detail & Related papers (2023-03-28T03:55:03Z)
Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences. In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z)
Forethought and Hindsight in Credit Assignment [62.05690959741223]
We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
arXiv Detail & Related papers (2020-10-26T16:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.