Fugu-MT 論文翻訳(概要): Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine

論文の概要: Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine

arxiv url: http://arxiv.org/abs/2510.21614v3
Date: Wed, 29 Oct 2025 13:57:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-30 13:34:45.427413
Title: Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Title（参考訳）: Huxley-Gödel Machine: 最適自己改善マシンの近似による人間レベル符号化エージェントの開発
Authors: Wenyi Wang, Piotr Piękos, Li Nanbo, Firas Laakom, Yimeng Chen, Mateusz Ostaszewski, Mingchen Zhuge, Jürgen Schmidhuber,
Abstract要約: エージェントの自己改善ポテンシャルと符号化ベンチマーク性能のミスマッチを同定する。ハクスリーのクラッドの概念に触発され、エージェントの子孫のベンチマークパフォーマンスを集計する計量(mathrmCMP$)を提案する。我々はHuxley-G"odel Machine (HGM)を紹介し、$mathrmCMP$を推定してガイダンスとして使用することにより、自己修正のツリーを検索する。
参考スコア（独自算出の注目度）: 31.795598366502166
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent studies operationalize self-improvement through coding agents that edit their own codebases. They grow a tree of self-modifications through expansion strategies that favor higher software engineering benchmark performance, assuming that this implies more promising subsequent self-modifications. However, we identify a mismatch between the agent's self-improvement potential (metaproductivity) and its coding benchmark performance, namely the Metaproductivity-Performance Mismatch. Inspired by Huxley's concept of clade, we propose a metric ($\mathrm{CMP}$) that aggregates the benchmark performances of the descendants of an agent as an indicator of its potential for self-improvement. We show that, in our self-improving coding agent development setting, access to the true $\mathrm{CMP}$ is sufficient to simulate how the G\"odel Machine would behave under certain assumptions. We introduce the Huxley-G\"odel Machine (HGM), which, by estimating $\mathrm{CMP}$ and using it as guidance, searches the tree of self-modifications. On SWE-bench Verified and Polyglot, HGM outperforms prior self-improving coding agent development methods while using fewer allocated CPU hours. Last but not least, HGM demonstrates strong transfer to other coding datasets and large language models. The agent optimized by HGM on SWE-bench Verified with GPT-5-mini and evaluated on SWE-bench Lite with GPT-5 achieves human-level performance, matching the best officially checked results of human-engineered coding agents. Our code is publicly available at https://github.com/metauto-ai/HGM.
Abstract（参考訳）: 最近の研究は、自身のコードベースを編集するコーディングエージェントを通じて自己改善を運用している。彼らは、より高いソフトウェアエンジニアリングベンチマークのパフォーマンスを優先する拡張戦略を通じて、自己修正のツリーを成長させます。しかし、エージェントの自己改善ポテンシャル(メタ生産性)と符号化ベンチマーク性能(メタ生産性・パフォーマンスミストマッチ)のミスマッチを同定する。ハクスリーのクレードの概念に触発され、自己改善の可能性を示す指標として、エージェントの子孫のベンチマークパフォーマンスを集約する計量($\mathrm{CMP}$)を提案する。自己改善型コーディングエージェント開発環境では、真の$\mathrm{CMP}$にアクセスするだけで、G\"odel Machineが特定の仮定の下でどのように振る舞うかをシミュレートできることを示す。我々はHuxley-G\"odel Machine (HGM)を導入し、$\mathrm{CMP}$を推定し、それを用いて自己修正木を検索する。 SWE-bench VerifiedとPolyglotでは、HGMは割り当てられたCPU時間が少なくて、自己改善型コーディングエージェントの開発方法よりも優れています。最後に重要なのは、HGMが他のコーディングデータセットや大きな言語モデルに強く移行していることである。 GPT-5-miniで検証されたSWE-bench上でHGMが最適化し,GPT-5を用いてSWE-bench Liteで評価した。私たちのコードはhttps://github.com/metauto-ai/HGM.comで公開されています。

論文の概要: Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine

関連論文リスト