Related papers: GridRoute: A Benchmark for LLM-Based Route Planning with Cardinal Movement in Grid Environments

GridRoute: A Benchmark for LLM-Based Route Planning with Cardinal Movement in Grid Environments

URL: http://arxiv.org/abs/2505.24306v1
Date: Fri, 30 May 2025 07:40:59 GMT
Title: GridRoute: A Benchmark for LLM-Based Route Planning with Cardinal Movement in Grid Environments
Authors: Kechen Li, Yaotian Tao, Ximing Wen, Quanwei Sun, Zifei Gong, Chang Xu, Xizhe Zhang, Tianbo Ji,
Abstract summary: Large Language Models (LLMs) have demonstrated their potential in planning and reasoning tasks.<n>We propose a comprehensive evaluation benchmark GridRoute to assess how LLMs can take advantage of traditional algorithms.<n>We also propose a novel hybrid prompting technique called Algorithm of Thought (AoT) which introduces traditional algorithms' guidance into prompting.
Score: 14.584687937592536
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated their potential in planning and reasoning tasks, offering a flexible alternative to classical pathfinding algorithms. However, most existing studies focus on LLMs' independent reasoning capabilities and overlook the potential synergy between LLMs and traditional algorithms. To fill this gap, we propose a comprehensive evaluation benchmark GridRoute to assess how LLMs can take advantage of traditional algorithms. We also propose a novel hybrid prompting technique called Algorithm of Thought (AoT), which introduces traditional algorithms' guidance into prompting. Our benchmark evaluates six LLMs ranging from 7B to 72B parameters across various map sizes, assessing their performance in correctness, optimality, and efficiency in grid environments with varying sizes. Our results show that AoT significantly boosts performance across all model sizes, particularly in larger or more complex environments, suggesting a promising approach to addressing path planning challenges. Our code is open-sourced at https://github.com/LinChance/GridRoute.

Related papers

Fine-tuning Large Language Model for Automated Algorithm Design [23.04239252690957]
We explore fine-tuning of large language models (LLMs) for algorithm design.<n>Our experiments span three distinct algorithm design tasks.<n>Results suggest that finetuned LLMs can significantly outperform their off-the-shelf counterparts.
arXiv Detail & Related papers (2025-07-13T15:21:23Z)
Prime the search: Using large language models for guiding geometric task and motion planning by warm-starting tree search [21.42328403783795]
A problem of relocating a set of objects to designated areas amidst movable obstacles can be framed as a Geometric Task and Motion Planning (G-TAMP) problem.<n>Traditional approaches to G-TAMP have relied on domain-independents or on learning from planning experience to guide the search.<n>We propose leveraging Large Language Models (LLMs), which have common sense knowledge acquired from internet-scale data, to guide task planning in G-TAMP problems.
arXiv Detail & Related papers (2025-06-08T09:47:54Z)
Fitness Landscape of Large Language Model-Assisted Automated Algorithm Search [15.767411435705752]
We show and analyze the fitness landscape of Large Language Models-assisted Algorithm Search.<n>Our findings reveal that LAS landscapes are highly multimodal and rugged.<n>We also demonstrate how population size influences exploration-exploitation trade-offs and the evolving trajectory of elite algorithms.
arXiv Detail & Related papers (2025-04-28T09:52:41Z)
Universal Model Routing for Efficient LLM Inference [72.65083061619752]
We consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts.<n>We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)
When Do LLMs Help With Node Classification? A Comprehensive Analysis [21.120619437937382]
We develop a comprehensive and testbed for node classification using Large Language Models (LLMs)<n>It includes 10 homophilic datasets, 4 heterophilic datasets, 8 LLM-based algorithms, 8 classic baselines, and 3 learning paradigms.<n>Our findings uncover 8 insights, e.g., (1) LLM-based methods can significantly outperform traditional methods in a semi-supervised setting, while the advantage is marginal in a supervised setting.
arXiv Detail & Related papers (2025-02-02T15:56:05Z)
Meta-Learning Objectives for Preference Optimization [39.15940594751445]
We show that it is possible to gain insights on the efficacy of preference optimization algorithms on simpler benchmarks.<n>We propose a novel family of PO algorithms based on mirror descent, which we call Mirror Preference Optimization (MPO)
arXiv Detail & Related papers (2024-11-10T19:11:48Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning [91.95362946266577]
Path planning is a fundamental scientific problem in robotics and autonomous navigation.<n>Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows.<n>We propose a new LLM based route planning method that synergistically combines the precise pathfinding capabilities of A* with the global reasoning capability of LLMs.<n>This hybrid approach aims to enhance pathfinding efficiency in terms of time and space complexity while maintaining the integrity of path validity, especially in large-scale scenarios.
arXiv Detail & Related papers (2024-06-20T01:24:30Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models [17.059322033670124]
We propose a novel strategy that propels Large Language Models through algorithmic reasoning pathways. Our results suggest that instructing an LLM using an algorithm can lead to performance surpassing that of the algorithm itself.
arXiv Detail & Related papers (2023-08-20T22:36:23Z)
A Metaheuristic Algorithm for Large Maximum Weight Independent Set Problems [58.348679046591265]
Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum. Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges. We develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search framework.
arXiv Detail & Related papers (2022-03-28T21:34:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.