Fugu-MT 論文翻訳(概要): From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

論文の概要: From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

arxiv url: http://arxiv.org/abs/2605.26112v1
Date: Mon, 25 May 2026 17:59:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:20.660387
Title: From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
Title（参考訳）: モデルスケーリングからシステムスケーリングへ:エージェントAIのハーネスをスケール
Authors: Shangding Gu,
Abstract要約: 本稿では,エージェントAIの次なるボトルネックをシステムスケーリングとして検討する。我々は、このシフトをハーネスのスケーリングと呼び、基礎モデルを取り巻く構造化された実行層を設計、評価、最適化の第一級のオブジェクトとして扱います。私たちの主張では、エージェントAIの今後の進歩は、より強力な基礎モデルと同じくらい、システム設計に依存します。
参考スコア（独自算出の注目度）: 4.802305157491253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation models. We refer to this shift as scaling the harness: treating the structured execution layer around a foundation model as a first-class object of design, evaluation, and optimization. Although recent large language models enable agents to use tools, retrieve information, maintain memory, and execute long-horizon workflows, evaluation remains largely model-centric, often reducing agents to final-task success while treating memory, retrieval, tool use, orchestration, verification, and governance as secondary implementation details. This framing is increasingly inadequate because agent performance emerges from the interaction among the foundation model, memory substrate, context constructor, skill-routing layer, orchestration loop, and verification-and-governance layer. Together, these components form the agent harness, which translates model capability into long-horizon agent behavior. We study scaling the harness through three core bottlenecks: context governance, trustworthy memory, and dynamic skill routing, together with the orchestration and governance mechanisms that coordinate and constrain them. We further outline a research agenda for harness-level benchmarks that go beyond one-shot task success to measure trajectory quality, memory hygiene, context efficiency, communication fidelity, verification cost, and safe evolution over time. To make the discussion concrete, we develop CheetahClaws: https://github.com/SafeRL-Lab/cheetahclaws, a Python-native reference harness, and compare it with Claude Code and OpenClaw. Our main claim is that future progress in agentic AI will depend as much on system design as on stronger foundation models.
Abstract（参考訳）: 本稿では,エージェントAIの次なるボトルネックをシステムスケーリングとして検討し,モデルスケーリングだけでなく,基礎モデルを中心とした監査可能,永続的,モジュール的,検証可能なアーキテクチャの設計についても検討する。基礎モデルを中心とした構造化実行層を設計、評価、最適化の第一級のオブジェクトとして扱います。最近の大規模言語モデルは、エージェントがツールの使用、情報検索、メモリの保守、長期ワークフローの実行を可能にするが、評価はモデル中心であり、メモリ、検索、ツール使用、オーケストレーション、検証、ガバナンスを二次実装の詳細として扱うことで、エージェントを最終タスクの成功に還元することが多い。このフレーミングは、ファンデーションモデル、メモリ基板、コンテキストコンストラクタ、スキルルーティング層、オーケストレーションループ、検証とガバナンス層間の相互作用からエージェントのパフォーマンスが現れるため、ますます不十分になっている。これらのコンポーネントが組み合わさってエージェントハーネスを形成し、モデル能力をロングホライゾンエージェントの振る舞いに変換する。私たちは、コンテキストガバナンス、信頼できるメモリ、動的スキルルーティングという3つのボトルネックを通じてハーネスのスケーリングと、それらを調整および制約するオーケストレーションとガバナンスのメカニズムを研究します。さらに、軌道品質、メモリ衛生、文脈効率、通信の正確性、検証コスト、時間の経過とともに安全な進化を測定するために、1ショットのタスク成功を超えるハーネスレベルのベンチマークの研究課題について概説する。議論を具体的にするために、私たちはCheetahClawsを開発した。 https://github.com/SafeRL-Lab/cheetahclaws。私たちの主張では、エージェントAIの今後の進歩は、より強力な基礎モデルと同じくらい、システム設計に依存します。

論文の概要: From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

関連論文リスト