Fugu-MT 論文翻訳(概要): How Should Agents Read Demonstrations? Hierarchical Structure Beats Flat Action Logs

論文の概要: How Should Agents Read Demonstrations? Hierarchical Structure Beats Flat Action Logs

arxiv url: http://arxiv.org/abs/2606.20978v1
Date: Thu, 18 Jun 2026 22:57:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-26 11:23:06.910315
Title: How Should Agents Read Demonstrations? Hierarchical Structure Beats Flat Action Logs
Title（参考訳）: エージェントはいかにしてデモを読むべきか? 階層構造がフラットアクションログを上回る
Authors: Honjar Xing, Jefferson Lin, Henry Lieberman,
Abstract要約: 実証によるプログラミング(PbD)は、LLMエージェントの手続き的知識を書くための人間中心の方法を提供する。このログをエージェントに渡す前にどのように整理するかは、計画品質に重大な影響を与えるオープンデザインの問題である。記録された動作をラベル付き階層的なサブゴールに分類し,この組織構造の効果を制御実験で評価する。
参考スコア（独自算出の注目度）: 0.21847754147782883
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Programming by Demonstration (PbD) offers a human-centered way to author procedural knowledge for LLM agents: users communicate what they want by showing rather than by writing prompts or code, making agent authoring accessible to non-programmers. The natural output of a PbD recording is a flat action log, but how this log is organized before being passed to the agent is an open design question with significant consequences for plan quality. We propose grouping recorded actions into labeled, hierarchical subgoals and evaluate the effect of this organizational structure in a controlled experiment. Across 85 web automation tasks, we compare a zero-shot baseline against four demonstration formats that share identical action sequences but differ in structure. On 43 natural-language tasks with vague descriptions, hierarchically grouped demonstrations improve pass rates from 76.7\% to 90.7\% (paired permutation test $p{=}0.034$; win-loss 6:0), while flat demonstrations show a smaller, non-significant improvement. On 42 tasks with precise descriptions, no format provides any benefit, confirming that the hierarchical advantage arises specifically when descriptions leave procedural details ambiguous. Ablation shows that subgoal grouping alone drives the effect: preconditions, postconditions, and parameter annotations add no measurable benefit. These results offer a concrete design recommendation for PbD pipelines and, more broadly, for any system that feeds procedural context to an LLM agent: segment action sequences into named subgoal groups rather than presenting flat step lists.
Abstract（参考訳）: プログラミング by Demonstration (PbD) は、LLMエージェントの手続き的知識を記述するための人間中心の方法を提供する。 PbDレコードの自然な出力は平らなアクションログであるが、エージェントに渡される前にどのようにこのログが組織されるかは、プラン品質に重大な影響を与えるオープンデザインの問題である。記録された動作をラベル付き階層的なサブゴールに分類し,この組織構造の効果を制御実験で評価する。 85のWeb自動化タスクにおいて、ゼロショットベースラインと同一のアクションシーケンスを共有するが構造が異なる4つのデモフォーマットを比較した。あいまいな記述を持つ43の自然言語タスクでは、階層的にグループ化されたデモはパスレートを76.7\%から90.7\%(ペア化置換テスト$p{=}0.034$; win-loss 6:0)に改善し、フラットなデモはより小さく、重要でない改善を示している。正確な記述を伴う42のタスクでは、いかなる形式も利点を与えておらず、記述が手続き的な詳細を曖昧にしておくと、階層的な利点が特に生じることを確認している。前提条件、後条件、パラメータアノテーションは測定可能な利益を与えない。これらの結果は、PbDパイプラインのための具体的な設計勧告を提供し、より広義には、手続き的コンテキストをLLMエージェントに供給するシステムに対して、フラットなステップリストを提示するのではなく、名前付きサブゴナルグループにアクションシーケンスを分割する。

論文の概要: How Should Agents Read Demonstrations? Hierarchical Structure Beats Flat Action Logs

関連論文リスト