Fugu-MT 論文翻訳(概要): AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

論文の概要: AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

arxiv url: http://arxiv.org/abs/2605.13357v1
Date: Wed, 13 May 2026 11:14:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:28.00363
Title: AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents
Title（参考訳）: AI Harness Engineering: ファンデーションモデルソフトウェアエージェントのランタイム基盤
Authors: Hailin Zhong, Shengxin Zhu,
Abstract要約: ファンデーションモデルは、自動コード生成を変革しましたが、現実的な開発環境では、自律的なソフトウェアエンジニアリングエージェントは信頼できないままです。本稿では,基盤モデルエージェントがプロジェクトを観察し,それを処理し,フィードバックを受信し,変更が完了したことを確定する,モデルハーネス環境システムを提案する。このフレームワークは、ファンデーションモデルがパッチを作成できるかどうかから、モデルハーネス環境システムが検証可能な正確さ、属性、メンテナンス可能な変更を生成できるかどうかという、自律的なソフトウェアエンジニアリングの中心的な疑問を再考する。
参考スコア（独自算出の注目度）: 1.4323566945483497
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models have transformed automated code generation, yet autonomous software-engineering agents remain unreliable in realistic development settings. The dominant explanation locates this gap in model capability. We propose a different locus: software-engineering capability emerges from a model-harness-environment system, in which a runtime substrate -- the harness -- mediates how a foundation-model agent observes a project, acts on it, receives feedback, and establishes that a change is complete. We formalize this substrate as an AI Harness Engineering and identify eleven component responsibilities: task specification, context selection, tool access, project memory, task state, observability, failure attribution, verification, permissions, entropy auditing, and intervention recording. We operationalize the harness through a four-level ladder (H0-H3) that progressively exposes runtime support to the agent, and we propose a trace-based evaluation protocol that converts each agent run into an auditable episode package. Applied to a controlled validation task, the framework yields episode packages whose evidence structure varies systematically with harness level: lower levels produce only a final patch, higher levels produce reproduction logs, failure attributions, deterministic requirement checks, and structured verification reports. The framework reframes the central question of autonomous software engineering from whether a foundation model can produce a patch to whether the model-harness-environment system can produce a verifiably correct, attributed, and maintainable change. We outline a research program for the runtime systems that foundation-model software agents will require.
Abstract（参考訳）: ファンデーションモデルは、自動コード生成を変革しましたが、現実的な開発環境では、自律的なソフトウェアエンジニアリングエージェントは信頼できないままです。支配的な説明は、モデル能力のこのギャップを見つけることである。ソフトウェアエンジニアリング能力は、ランタイム基板であるハーネスが、ファンデーションモデルエージェントがどのようにプロジェクトを観察し、それを実行し、フィードバックを受け、変更が完了したかを媒介するモデルハーネス環境システムから出現する。我々は、この基板をAIハーネスエンジニアリングとして形式化し、タスク仕様、コンテキスト選択、ツールアクセス、プロジェクトメモリ、タスク状態、可観測性、障害属性、検証、許可、エントロピー監査、介入記録の11のコンポーネント責任を特定します。エージェントに実行時サポートを段階的に公開する4レベルラグ(H0-H3)を介してハーネスを運用し、各エージェントを監査可能なエピソードパッケージに変換するトレースベースの評価プロトコルを提案する。制御された検証タスクに適用されたフレームワークは、エビデンス構造がハーネスレベルで体系的に変化するエピソードパッケージを生成する。下位レベルは最終パッチのみを生成し、上位レベルは再生ログを生成し、障害属性、決定論的要件チェック、構造化された検証レポートを生成する。このフレームワークは、ファンデーションモデルがパッチを作成できるかどうかから、モデルハーネス環境システムが検証可能な正確さ、属性、メンテナンス可能な変更を生成できるかどうかという、自律的なソフトウェアエンジニアリングの中心的な疑問を再考する。基礎モデルソフトウェアエージェントが必要とするランタイムシステムの研究プログラムの概要を述べる。

論文の概要: AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

関連論文リスト