Fugu-MT 論文翻訳(概要): A Stackelberg Framework for Resource-Aware LLM Agents: Learning, Repair, and Conditional Guarantees

論文の概要: A Stackelberg Framework for Resource-Aware LLM Agents: Learning, Repair, and Conditional Guarantees

arxiv url: http://arxiv.org/abs/2606.23026v1
Date: Mon, 22 Jun 2026 08:38:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 00:39:46.179639
Title: A Stackelberg Framework for Resource-Aware LLM Agents: Learning, Repair, and Conditional Guarantees
Title（参考訳）: 資源対応LLMエージェントのためのStackelbergフレームワーク:学習,修復,条件付き保証
Authors: Baoxun Wang,
Abstract要約: 大規模言語モデル(LLM)エージェントは、マルチターンシステムがコンテキストを割り当て、冗長性を促し、有限の計算予算の下でツールアクセスを行なわなければならないため、ますます運用されるようになっている。コントローラは品質目標とコストインセンティブにコミットし、エグゼキュータはコンテキスト、プロンプト、ツール使用に関するリソースアクションに応答します。我々は条件付き応答モデルを学び、そのモデルに対してリーダーポリシーを最適化し、実際のAPIキャリブレーションとプロジェクションを用いて結果のポリシーを修復する。
参考スコア（独自算出の注目度）: 1.9381445674403615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) agents increasingly operate as multi-turn systems that must allocate context, prompt verbosity, and tool access under finite computational budgets. Static thresholds are simple, but they are brittle under heterogeneous tasks and evolving session states. We formulate resource governance as a contextual Stackelberg game: a controller commits to a quality target and a cost incentive, while an executor responds with resource actions over context, prompting, and tool usage. We learn a conditional response model, optimize a leader policy against that model, and repair the resulting policy using real-API calibration and projection onto an empirically selected action set. For the restricted game, we establish conditional guarantees for equilibrium existence, follower-response stability, safe-set projection, and transfer from a surrogate environment to the real environment under bounded value error. The primary real-API experiment comprises 300 evaluated turns. Relative to a conservative baseline, the selected repaired controller reduces mean token cost by 17.4% (Welch $p=0.022$), while the measured quality difference is not statistically significant ($p=0.44$). The theoretical results are conditional and the experiments do not estimate their regret or transfer constants; consequently, the evidence establishes a promising repaired operating point, not a certified real-system equilibrium.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントは、コンテキストを割り当て、冗長性を促し、有限の計算予算の下でツールアクセスを必要とするマルチターンシステムとしてますます機能する。静的しきい値は単純だが、不均一なタスクと進化するセッション状態の下では不安定である。コントローラは品質目標とコストインセンティブにコミットし、エグゼキュータはコンテキスト、プロンプト、ツール使用に関するリソースアクションに応答します。我々は条件付き応答モデルを学び、そのモデルに対してリーダーポリシーを最適化し、実APIキャリブレーションと実験的に選択されたアクションセットへの投影を用いて結果のポリシーを修復する。制限されたゲームに対しては、平衡の存在、追従応答安定性、セーフセットプロジェクション、および境界値誤差の下で代理環境から実環境への移動の条件付き保証を確立する。第一の実API実験は300回評価されたターンからなる。保守的な基準に対して、選択された修理されたコントローラは平均トークンコストを17.4%削減する(Welch $p=0.022$)が、測定された品質差は統計的に有意ではない(p=0.44$)。理論的結果は条件付きであり、実験は彼らの後悔や移動定数を推定しないため、証明された実システム平衡ではなく、有望な修復操作点を確立する。

論文の概要: A Stackelberg Framework for Resource-Aware LLM Agents: Learning, Repair, and Conditional Guarantees

関連論文リスト