Fugu-MT 論文翻訳(概要): Can LLMs Prove Robotic Path Planning Optimality? A Benchmark for Research-Level Algorithm Verification

論文の概要: Can LLMs Prove Robotic Path Planning Optimality? A Benchmark for Research-Level Algorithm Verification

arxiv url: http://arxiv.org/abs/2603.19464v1
Date: Thu, 19 Mar 2026 20:55:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 19:48:38.882874
Title: Can LLMs Prove Robotic Path Planning Optimality? A Benchmark for Research-Level Algorithm Verification
Title（参考訳）: LLMはロボット経路計画最適性を証明できるか?-研究レベルアルゴリズム検証のためのベンチマーク
Authors: Zhengbang Yang, Md. Tasin Tazwar, Minghan Wei, Zhuangdi Zhu,
Abstract要約: 本稿では,ロボット経路計画アルゴリズムの近似比証明について,LLM(Large Language Models)を評価するための最初のベンチマークを紹介する。我々の評価では、最強のモデルでさえ、外部のドメイン知識なしで完全に有効な証明を作成するのに苦労していることが明らかになっている。
参考スコア（独自算出の注目度）: 5.637461397736495
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Robotic path planning problems are often NP-hard, and practical solutions typically rely on approximation algorithms with provable performance guarantees for general cases. While designing such algorithms is challenging, formally proving their approximation optimality is even more demanding, which requires domain-specific geometric insights and multi-step mathematical reasoning over complex operational constraints. Recent Large Language Models (LLMs) have demonstrated strong performance on mathematical reasoning benchmarks, yet their ability to assist with research-level optimality proofs in robotic path planning remains under-explored. In this work, we introduce the first benchmark for evaluating LLMs on approximation-ratio proofs of robotic path planning algorithms. The benchmark consists of 34 research-grade proof tasks spanning diverse planning problem types and complexity levels, each requiring structured reasoning over algorithm descriptions, problem constraints, and theoretical guarantees. Our evaluation of state-of-the-art proprietary and open-source LLMs reveals that even the strongest models struggle to produce fully valid proofs without external domain knowledge. However, providing LLMs with task-specific in-context lemmas substantially improves reasoning quality, a factor that is more effective than generic chain-of-thought prompting or supplying the ground-truth approximation ratio as posterior knowledge. We further provide fine-grained error analysis to characterize common logical failures and hallucinations, and demonstrate how each error type can be mitigated through targeted context augmentation.
Abstract（参考訳）: ロボット経路計画問題はしばしばNPハードであり、実用的な解法は一般に証明可能な性能保証を持つ近似アルゴリズムに依存する。このようなアルゴリズムを設計することは難しいが、その近似最適性を正式に証明することがさらに要求され、複雑な演算制約に対して、ドメイン固有の幾何学的洞察と多段階の数学的推論を必要とする。近年のLarge Language Models (LLMs) は、数学的推論ベンチマークにおいて強力な性能を示してきたが、ロボット経路計画における研究レベルの最適性証明を支援する能力は、まだ解明されていない。本研究では,ロボット経路計画アルゴリズムの近似比証明におけるLLMの評価のための最初のベンチマークを紹介する。このベンチマークは、様々な計画上の問題タイプと複雑性レベルにまたがる34の研究グレードの証明タスクで構成され、それぞれがアルゴリズムの記述、問題制約、理論的保証に対して構造化された推論を必要とする。最先端のプロプライエタリかつオープンソース LLM を評価した結果,最強のモデルでさえ,外部のドメイン知識を使わずに完全に有効な証明を作成するのに苦労していることが明らかとなった。しかし、タスク固有の文脈内補題をLLMに提供することにより、推論品質が大幅に向上する。さらに、一般的な論理的失敗や幻覚を特徴付けるためのきめ細かいエラー解析を提供し、各エラータイプをターゲットのコンテキスト拡張によって緩和する方法を実証する。

論文の概要: Can LLMs Prove Robotic Path Planning Optimality? A Benchmark for Research-Level Algorithm Verification

関連論文リスト