PLANET: A Collection of Benchmarks for Evaluating LLMs' Planning Capabilities
- URL: http://arxiv.org/abs/2504.14773v1
- Date: Mon, 21 Apr 2025 00:02:50 GMT
- Title: PLANET: A Collection of Benchmarks for Evaluating LLMs' Planning Capabilities
- Authors: Haoming Li, Zhaoliang Chen, Jonathan Zhang, Fei Liu,
- Abstract summary: Planning is central to agents and agentic AI.<n>To date, a comprehensive understanding of existing planning benchmarks appears to be lacking.<n>In this paper, we examine a range of planning benchmarks to identify commonly used testbeds for algorithm development.
- Score: 7.36760703426119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning is central to agents and agentic AI. The ability to plan, e.g., creating travel itineraries within a budget, holds immense potential in both scientific and commercial contexts. Moreover, optimal plans tend to require fewer resources compared to ad-hoc methods. To date, a comprehensive understanding of existing planning benchmarks appears to be lacking. Without it, comparing planning algorithms' performance across domains or selecting suitable algorithms for new scenarios remains challenging. In this paper, we examine a range of planning benchmarks to identify commonly used testbeds for algorithm development and highlight potential gaps. These benchmarks are categorized into embodied environments, web navigation, scheduling, games and puzzles, and everyday task automation. Our study recommends the most appropriate benchmarks for various algorithms and offers insights to guide future benchmark development.
Related papers
- Parallel Strategies for Best-First Generalized Planning [51.713634067802104]
Generalized planning (GP) is a research area of AI that studies the automated synthesis of algorithmic-like solutions capable of solving multiple classical planning instances.
One of the current advancements has been the introduction of Best-First Generalized Planning (BFGP), a GP algorithm based on a novel solution space that can be explored with search.
This paper evaluates the application of parallel search techniques to BFGP, another critical component in closing the performance gap.
arXiv Detail & Related papers (2024-07-31T09:50:22Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - LLM-SAP: Large Language Models Situational Awareness Based Planning [0.0]
We employ a multi-agent reasoning framework to develop a methodology that anticipates and actively mitigates potential risks.
Our approach diverges from traditional automata theory by incorporating the complexity of human-centric interactions into the planning process.
arXiv Detail & Related papers (2023-12-26T17:19:09Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - A Planning Ontology to Represent and Exploit Planning Knowledge for Performance Efficiency [6.87593454486392]
We consider the problem of automated planning, where the objective is to find a sequence of actions that will move an agent from an initial state of the world to a desired goal state.
We hypothesize that given a large number of available planners and diverse planning domains; they carry essential information that can be leveraged to identify suitable planners and improve their performance for a domain.
arXiv Detail & Related papers (2023-07-25T14:51:07Z) - PlanBench: An Extensible Benchmark for Evaluating Large Language Models
on Planning and Reasoning about Change [34.93870615625937]
PlanBench is a benchmark suite based on the kinds of domains used in the automated planning community.
PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities.
arXiv Detail & Related papers (2022-06-21T16:15:27Z) - Adversarial Plannning [8.930624061602046]
Planning algorithms are used in computational systems to direct autonomous behavior.
It is unclear how such algorithms will perform in the face of adversaries attempting to thwart the planner.
arXiv Detail & Related papers (2022-05-01T21:43:06Z) - Representation, learning, and planning algorithms for geometric task and
motion planning [24.862289058632186]
We present a framework for learning to guide geometric task and motion planning (GTAMP)
GTAMP is a subclass of task and motion planning in which the goal is to move multiple objects to target regions among movable obstacles.
A standard graph search algorithm is not directly applicable, because GTAMP problems involve hybrid search spaces and expensive action feasibility checks.
arXiv Detail & Related papers (2022-03-09T09:47:01Z) - Systematic Comparison of Path Planning Algorithms using PathBench [55.335463666037086]
Path planning is an essential component of mobile robotics.
Development of learning-based path planning algorithms has been experiencing rapid growth.
This paper presents PathBench, a platform for developing, visualizing, training, testing, and benchmarking of existing and future path planning algorithms.
arXiv Detail & Related papers (2022-03-07T01:52:57Z) - Learning off-road maneuver plans for autonomous vehicles [0.0]
This thesis explores the benefits machine learning algorithms can bring to online planning and scheduling for autonomous vehicles in off-road situations.
We present a range of learning-baseds to assist different itinerary planners.
In order to synthesize strategies to execute synchronized maneuvers, we propose a novel type of scheduling controllability and a learning-assisted algorithm.
arXiv Detail & Related papers (2021-08-02T16:27:59Z) - PathBench: A Benchmarking Platform for Classical and Learned Path
Planning Algorithms [59.3879573040863]
Path planning is a key component in mobile robotics.
Few attempts have been made to benchmark the algorithms holistically or unify their interface.
This paper presents PathBench, a platform for developing, visualizing, training, testing, and benchmarking of existing and future path planning algorithms.
arXiv Detail & Related papers (2021-05-04T21:48:18Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.