Fugu-MT 論文翻訳(概要): Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

論文の概要: Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

arxiv url: http://arxiv.org/abs/2605.16205v1
Date: Fri, 15 May 2026 17:23:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 17:44:16.356427
Title: Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Title（参考訳）: 文脈, 推論, 階層性:敵対的POMDPにおける複合LLMエージェント設計のコストパフォーマンスに関する研究
Authors: Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, Adrian Taylor, Marzia Zaman,
Abstract要約: プログラム状態抽象化は、使用されるトークン当たりの最大のリターンを提供する。階層をまたいだ議論ツールの配布は、階層のみに対するパフォーマンスを低下させる。議論のない階層分解は、ほとんどのモデルにとって最高の絶対的な性能を達成する。
参考スコア（独自算出の注目度）: 3.774094352572544
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve performance versus merely increase inference costs. We present a controlled study of compound LLM agent design in CybORG CAGE-2, a cyber defense environment modeled as a Partially Observable Markov Decision Process (POMDP). Reward is non-positive, so all configurations operate in a failure-mitigation mode. Our evaluation spans five model families, six models, and twelve configurations (3,475 episodes) with token-level cost accounting. We vary context representation (raw observations vs. a deterministic state-tracking layer with compressed history), deliberation (self-questioning, self-critique, and self-improvement tools, with optional chain-of-thought prompting), and hierarchical decomposition (monolithic ReAct vs. delegation to specialized sub-agents). We find that: (1) Programmatic state abstraction delivers the largest returns per token spent (RPTS), improving mean return by up to 76% over raw observations. (2) Distributing deliberation tools across a hierarchy degrades performance relative to hierarchy alone for all five model families, reaching up to 3.4$\times$ worse mean return while using 1.8-2.7$\times$ more tokens. We call this destructive pattern a deliberation cascade. (3) Hierarchical decomposition without deliberation achieves the best absolute performance for most models, and context engineering is generally more cost-effective than deliberation. These findings suggest a design principle for structured adversarial POMDPs: invest in programmatic infrastructure and clean task decomposition rather than deeper per-agent reasoning, as these strategies can interfere when combined.
Abstract（参考訳）: 逆順に部分的に観測可能な環境に複合LLMエージェントを配置するには、(1)エージェントが見ているもの、(2)原因、(3)コンポーネント間でタスクが分解される方法など、いくつかの設計次元をナビゲートする必要がある。しかし実践者は、どの設計選択によってパフォーマンスが向上するか、あるいは単に推論コストが上昇するかのガイダンスを欠いている。本稿では,部分観測可能なマルコフ決定プロセス (POMDP) としてモデル化されたサイバー防御環境であるCybORG CAGE-2における複合LLMエージェントの設計に関する制御研究について述べる。 Rewardは非陽性であるため、すべての構成はフェール軽減モードで動作する。評価対象は5つのモデルファミリー,6つのモデル,12の構成(3,475エピソード)で,トークンレベルのコスト計算を行う。我々は、文脈表現(歴史を圧縮した決定論的状態追跡層と比較して)、熟考(自問、自己批判、自己改善ツール、オプションのチェーン・オブ・プルーピングを含む)、階層分解(モノリシックなReAct vs. 専門的なサブエージェントへの委譲)など、様々である。 1) プログラム状態の抽象化は、トークン当たりの最大リターン(RPTS)を提供し、生の観測よりも平均リターンを最大76%改善します。 2) 階層にまたがる議論ツールの配布は、5つのモデルファミリーすべてに対して、階層のみと比較してパフォーマンスを低下させ、1.8-2.7$\times$以上のトークンを使用しながら、最大3.4$\times$ worse mean returnに達する。私たちはこの破壊的なパターンを熟考のカスケードと呼んでいる。 (3)議論のない階層的分解は、ほとんどのモデルにとって最高の絶対的性能を達成する。これらの知見は, プログラム的インフラとクリーンなタスク分解に投資することであり, これらの戦略が組み合わさると阻害される可能性があることを示唆している。

論文の概要: Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

関連論文リスト