Fugu-MT 論文翻訳(概要): Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

論文の概要: Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

arxiv url: http://arxiv.org/abs/2508.13876v1
Date: Tue, 19 Aug 2025 14:42:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-20 15:36:31.956629
Title: Improved Generalized Planning with LLMs through Strategy Refinement and Reflection
Title（参考訳）: 戦略リファインメントとリフレクションによるLCMの汎用計画の改善
Authors: Katharina Stein, Nils Hodel, Daniel Fišer, Jörg Hoffmann, Michael Katz, Alexander Koller,
Abstract要約: 疑似コードの形で戦略を生成する手法を提案する。我々は、Pythonデバッグフェーズをリフレクションステップで拡張し、LLMが観測された計画失敗の理由を特定できるようにする。 17のベンチマーク領域で実験を行い、これらの拡張によって一般化された計画の品質が大幅に向上することを示した。
参考スコア（独自算出の注目度）: 58.79806530685551
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLMs have recently been used to generate Python programs representing generalized plans in PDDL planning, i.e., plans that generalize across the tasks of a given PDDL domain. Previous work proposed a framework consisting of three steps: the LLM first generates a summary and then a strategy for the domain, both in natural language, and then implements that strategy as a Python program, that gets debugged on example planning tasks. In that work, only one strategy is generated and passed directly to the program generation. If the strategy is incorrect, its implementation will therefore result in an incorrect generalized plan. Here, we introduce an approach that generates the strategy in the form of pseudocode and enables automatic debugging of the pseudocode, hence allowing us to identify and fix errors prior to the generation of the generalized plan itself. Additionally, we extend the Python debugging phase with a reflection step prompting the LLM to pinpoint the reason for the observed plan failure. Finally, we take inspiration from LLM code generation to produce several program variants and pick the best one. Running experiments on 17 benchmark domains, we show that these extensions substantially improve (and never deteriorate) the quality of the generalized plans. In 12 of the domains, our best Python programs solve all tasks that can be generated with the respective instance generator.
Abstract（参考訳）: LLMは、最近、PDDLプランニングにおける一般的な計画を表すPythonプログラム、すなわち、あるPDDLドメインのタスクをまたいで一般化する計画を生成するために使用されている。 LLMはまず要約を生成し、その後、自然言語の両方でドメインの戦略を生成し、その後、Pythonプログラムとしてその戦略を実装し、サンプル計画タスクでデバッグされる。その作業では、1つの戦略のみが生成され、プログラム生成に直接渡される。戦略が間違っていれば、その実装は不正確な一般化計画をもたらす。本稿では,擬似コードの形式で戦略を生成する手法を導入し,擬似コードの自動デバッグを可能にする。さらに、リフレクションステップでPythonデバッグフェーズを拡張して、LLMが観測された計画失敗の理由を特定できるようにします。最後に、LLMコード生成からインスピレーションを得て、いくつかのプログラム変種を生成し、最良のものを選びます。 17のベンチマーク領域で実験を行い、これらの拡張が一般化された計画の品質を大幅に改善する(そして決して悪化しない)ことを示した。 12のドメインで、最高のPythonプログラムは、各インスタンスジェネレータで生成されるすべてのタスクを解決します。

論文の概要: Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

関連論文リスト