Fugu-MT 論文翻訳(概要): GNNVerifier: Graph-based Verifier for LLM Task Planning

論文の概要: GNNVerifier: Graph-based Verifier for LLM Task Planning

arxiv url: http://arxiv.org/abs/2603.14730v2
Date: Tue, 17 Mar 2026 04:26:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 13:19:43.945456
Title: GNNVerifier: Graph-based Verifier for LLM Task Planning
Title（参考訳）: GNN Verifier: LLMタスク計画のためのグラフベースの検証器
Authors: Yu Hao, Qiuyu Wang, Cheng Yang, Yawen Li, Zhiqiang Zhang, Chuan Shi,
Abstract要約: 大規模言語モデル(LLM)は、自律エージェントの開発を促進する。近年の研究では、潜在的な欠陥を特定し、修正するための計画検証器が導入されている。既存のほとんどのアプローチは、検証子として LLM に依存している。 LLMタスク計画のためのグラフベースの検証器を提案する。
参考スコア（独自算出の注目度）: 26.77252346424261
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) facilitate the development of autonomous agents. As a core component of such agents, task planning aims to decompose complex natural language requests into concrete, solvable sub-tasks. Since LLM-generated plans are frequently prone to hallucinations and sensitive to long-context prom-pts, recent research has introduced plan verifiers to identify and correct potential flaws. However, most existing approaches still rely on an LLM as the verifier via additional prompting for plan review or self-reflection. LLM-based verifiers can be misled by plausible narration and struggle to detect failures caused by structural relations across steps, such as type mismatches, missing intermediates, or broken dependencies. To address these limitations, we propose a graph-based verifier for LLM task planning. Specifically, the proposed method has four major components: Firstly, we represent a plan as a directed graph with enriched attributes, where nodes denote sub-tasks and edges encode execution order and dependency constraints. Secondly, a graph neural network (GNN) then performs structural evaluation and diagnosis, producing a graph-level plausibility score for plan acceptance as well as node/edge-level risk scores to localize erroneous regions. Thirdly, we construct controllable perturbations from ground truth plan graphs, and automatically generate training data with fine-grained annotations. Finally, guided by the feedback from our GNN verifier, we enable an LLM to conduct local edits (e.g., tool replacement or insertion) to correct the plan when the graph-level score is insufficient. Extensive experiments across diverse datasets, backbone LLMs, and planners demonstrate that our GNNVerifier achieves significant gains in improving plan quality. Our data and code is available at https://github.com/BUPT-GAMMA/GNNVerifier.
Abstract（参考訳）: 大規模言語モデル(LLM)は、自律エージェントの開発を促進する。このようなエージェントの中核的な構成要素として、タスクプランニングは複雑な自然言語要求を具体的で解決可能なサブタスクに分解することを目的としている。 LLMが生成する計画はしばしば幻覚を起こしやすく、長文のprom-ptsに敏感であるため、最近の研究は潜在的な欠陥を特定し修正するための計画検証器を導入している。しかし、既存のほとんどのアプローチは、計画レビューや自己回帰のための追加のプロンプトを通じて、LCMを検証手段として頼りにしている。 LLMベースの検証は、単純なナレーションによって誤解され、型ミスマッチ、欠落した中間子、あるいは壊れた依存関係など、ステップ間の構造的関係に起因する失敗を検出するのに苦労する。これらの制約に対処するため,LLMタスク計画のためのグラフベースの検証器を提案する。具体的には,提案手法には4つの主要な構成要素がある: まず,ノードがサブタスクを示し,エッジが実行順序と依存性の制約を符号化する,リッチな属性を持つ有向グラフとして計画を表現する。次に、グラフニューラルネットワーク(GNN)が構造評価と診断を行い、計画受け入れのためのグラフレベルの妥当性スコアとノード/エッジレベルのリスクスコアを生成し、誤った領域をローカライズする。第三に、地上の真理計画グラフから制御可能な摂動を構築し、微粒なアノテーションでトレーニングデータを自動的に生成する。最後に、GNN検証装置からのフィードバックによってLLMが局所的な編集(例えば、ツール置換や挿入)を行い、グラフレベルスコアが不十分な場合に計画を修正する。多様なデータセット、バックボーンLLM、プランナーにわたる大規模な実験は、我々のGNNVerifierがプランの品質向上に大きく貢献することを示した。私たちのデータとコードはhttps://github.com/BUPT-GAMMA/GNNVerifier.comで公開されています。

論文の概要: GNNVerifier: Graph-based Verifier for LLM Task Planning

関連論文リスト