Fugu-MT 論文翻訳(概要): The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

論文の概要: The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

arxiv url: http://arxiv.org/abs/2606.04455v1
Date: Wed, 03 Jun 2026 04:58:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.557736
Title: The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?
Title（参考訳）: メタエージェントの課題: 現在のエージェントは自律エージェント開発が可能か?
Authors: Xinyu Lu, Tianshu Wang, Pengbo Wang, zujie wen, Zhiqiang Zhang, Jun Zhou, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun,
Abstract要約: 本稿では,自律エージェント開発のためのフロンティアモデルのキャパシティをテストするための評価フレームワークであるMeta-Agent Challenge(MAC)を紹介する。評価の整合性を確保するため、このフレームワークは報奨ハッキングに対する多層防御によって確保される。メタエージェントは人間工学的な基本方針とほとんど一致せず、その一部はプロプライエタリなフロンティアモデルに支配されている。
参考スコア（独自算出の注目度）: 80.24951682268332
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current AI benchmarks evaluate agents on task execution within human-designed workflows. These evaluations fundamentally fail to measure a critical next-level capability: whether models can autonomously develop agent systems. We introduce the Meta-Agent Challenge (MAC), an evaluation framework designed to test the capacity of frontier models for autonomous agent development. Specifically, a code agent (the meta-agent) is given a sandboxed environment, an evaluation API, and a time limitation to iteratively program an agent artifact that maximizes performance on a held-out test set across five domains. To ensure evaluation integrity, this framework is secured by multi-layer defenses against reward hacking. Leveraging this framework, we demonstrate that meta-agents rarely match human-engineered baseline policies, and the few that do are dominated by proprietary frontier models. Moreover, the design process exhibits high variance, and high optimization pressure surfaces emergent adversarial behaviors like ground-truth exfiltration-highlighting critical deficits in both robustness and model alignment. Ultimately, MAC provides a rigorous, open-source benchmark for autonomous AI research and development, offering an empirical proxy for evaluating recursive self-improvement. Benchmark is publicly available at: https://github.com/ant-research/meta-agent-challenge.
Abstract（参考訳）: 現在のAIベンチマークは、人間が設計したワークフロー内でのタスク実行のエージェントを評価する。これらの評価は基本的に、モデルがエージェントシステムを自律的に開発できるかどうかという、重要な次のレベル能力を測定するのに失敗する。本稿では,自律エージェント開発のためのフロンティアモデルのキャパシティをテストするための評価フレームワークであるMeta-Agent Challenge(MAC)を紹介する。具体的には、コードエージェント(メタエージェント)がサンドボックス環境、評価API、および5つのドメインにまたがるホールドアウトテストセットのパフォーマンスを最大化するエージェントアーティファクトを反復的にプログラムするための時間制限を与えられる。評価の整合性を確保するため、このフレームワークは報奨ハッキングに対する多層防御によって確保される。このフレームワークを利用することで、メタエージェントがヒューマンエンジニアリングのベースラインポリシーにマッチすることは滅多になく、そのいくつかはプロプライエタリなフロンティアモデルに支配されていることを実証する。さらに, 設計プロセスは, 高ばらつきを示し, 高最適化圧力面は, 強靭性およびモデルアライメントの両面において, 地中トルース浸透高照度臨界欠陥などの対向挙動を発生させる。最終的にMACは、自律的なAI研究開発のための厳格でオープンソースのベンチマークを提供し、再帰的な自己改善を評価するための実証的なプロキシを提供する。 Benchmarkは、https://github.com/ant-research/meta-agent-challenge.comで公開されている。

論文の概要: The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

関連論文リスト