Fugu-MT 論文翻訳(概要): Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine

論文の概要: Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine

arxiv url: http://arxiv.org/abs/2510.21614v1
Date: Fri, 24 Oct 2025 16:19:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 09:00:15.542444
Title: Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Title（参考訳）: Huxley-Gödel Machine: 最適自己改善マシンの近似による人間レベル符号化エージェントの開発
Authors: Wenyi Wang, Piotr Piękos, Li Nanbo, Firas Laakom, Yimeng Chen, Mateusz Ostaszewski, Mingchen Zhuge, Jürgen Schmidhuber,
Abstract要約: エージェントの自己改善ポテンシャルと符号化ベンチマーク性能のミスマッチを同定する。ハクスリーのクラッドの概念に触発され、エージェントの子孫のベンチマークパフォーマンスを集計する計量(mathrmCMP$)を提案する。我々はHuxley-G"odel Machine (HGM)を紹介し、$mathrmCMP$を推定してガイダンスとして使用することにより、自己修正のツリーを検索する。
参考スコア（独自算出の注目度）: 31.795598366502166
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent studies operationalize self-improvement through coding agents that edit their own codebases. They grow a tree of self-modifications through expansion strategies that favor higher software engineering benchmark performance, assuming that this implies more promising subsequent self-modifications. However, we identify a mismatch between the agent's self-improvement potential (metaproductivity) and its coding benchmark performance, namely the Metaproductivity-Performance Mismatch. Inspired by Huxley's concept of clade, we propose a metric ($\mathrm{CMP}$) that aggregates the benchmark performances of the descendants of an agent as an indicator of its potential for self-improvement. We show that, in our self-improving coding agent development setting, access to the true $\mathrm{CMP}$ is sufficient to simulate how the G\"odel Machine would behave under certain assumptions. We introduce the Huxley-G\"odel Machine (HGM), which, by estimating $\mathrm{CMP}$ and using it as guidance, searches the tree of self-modifications. On SWE-bench Verified and Polyglot, HGM outperforms prior self-improving coding agent development methods while using less wall-clock time. Last but not least, HGM demonstrates strong transfer to other coding datasets and large language models. The agent optimized by HGM on SWE-bench Verified with GPT-5-mini and evaluated on SWE-bench Lite with GPT-5 achieves human-level performance, matching the best officially checked results of human-engineered coding agents. Our code is available at https://github.com/metauto-ai/HGM.
Abstract（参考訳）: 最近の研究は、自身のコードベースを編集するコーディングエージェントを通じて自己改善を運用している。彼らは、より高いソフトウェアエンジニアリングベンチマークのパフォーマンスを優先する拡張戦略を通じて、自己修正のツリーを成長させます。しかし、エージェントの自己改善ポテンシャル(メタ生産性)と符号化ベンチマーク性能(メタ生産性・パフォーマンスミストマッチ)のミスマッチを同定する。ハクスリーのクレードの概念に触発され、自己改善の可能性を示す指標として、エージェントの子孫のベンチマークパフォーマンスを集約する計量($\mathrm{CMP}$)を提案する。自己改善型コーディングエージェント開発環境では、真の$\mathrm{CMP}$にアクセスするだけで、G\"odel Machineが特定の仮定の下でどのように振る舞うかをシミュレートできることを示す。我々はHuxley-G\"odel Machine (HGM)を導入し、$\mathrm{CMP}$を推定し、それを用いて自己修正木を検索する。 SWE-bench VerifiedとPolyglotでは、HGMはウォールクロック時間が少なくて自己改善型コーディングエージェントの開発方法よりも優れています。最後に重要なのは、HGMが他のコーディングデータセットや大きな言語モデルに強く移行していることである。 GPT-5-miniで検証されたSWE-bench上でHGMが最適化し,GPT-5を用いてSWE-bench Liteで評価した。私たちのコードはhttps://github.com/metauto-ai/HGM.comで公開されています。

論文の概要: Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine

関連論文リスト