Fugu-MT 論文翻訳(概要): What Do Evolutionary Coding Agents Evolve?

論文の概要: What Do Evolutionary Coding Agents Evolve?

arxiv url: http://arxiv.org/abs/2605.20086v1
Date: Tue, 19 May 2026 16:41:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.531526
Title: What Do Evolutionary Coding Agents Evolve?
Title（参考訳）: 進化的コーディングエージェントは何を進化させるのか?
Authors: Nico Pelleriti, Sree Harsha Nelaturu, Zhanke Zhou, Zongze Li, Max Zimmer, Bo Han, Sebastian Pokutta,
Abstract要約: EvoTraceは4つの進化的フレームワーク、推論と非推論モデル、数学とアルゴリズム設計にまたがる16のタスクからなる進化的コーディングトレースのデータセットである。本研究では,これらのトレースを,ハイスコアソリューションの裏側にある局所的な検索状態を再構成するリプレイベースの手法であるEvoReplayを用いて解析する。 EvoTrace全体では、ほとんどのスコアはこれらの編集タイプの小さなサブセットから得ている。
参考スコア（独自算出の注目度）: 46.561777689365165
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, yet a fundamental question remains: what do they actually evolve? Progress is typically summarized by the best score a run reaches under a task-specific evaluator, but that score can reflect several different mechanisms: new algorithmic structure, re-tuning an existing strategy, recombining ideas already in the model's internal knowledge, or overfitting to the evaluator. Distinguishing these mechanisms requires inspecting the search process itself, not only its final outcome. We introduce EvoTrace, a dataset of evolutionary coding traces spanning four evolutionary frameworks, reasoning and non-reasoning models, and 16 tasks across mathematics and algorithm design. To analyze these traces, we develop EvoReplay, a replay-based methodology that reconstructs the local search states behind high-scoring solutions and tests controlled interventions, including adjusting constants, removing program components and substituting models or prompting contexts. We annotate every code edit in EvoTrace with one of nine recurring edit types using an LLM-as-judge pipeline validated against blind human re-annotation. Across EvoTrace, most score gains come from a small subset of these edit types. We further find a deterministic cycling pattern: about 30% of code lines added during search are byte-identical re-introductions of previously-deleted lines, present throughout nearly every run. These results show that benchmark gains in evolutionary coding agents can arise from qualitatively different mechanisms, only some of which correspond to new algorithmic structure. EvoTrace enables more diagnostic evaluation of evolutionary coding agents beyond final benchmark scores.
Abstract（参考訳）: 最近のワークペアは、進化的検索と組み合わせて、タスク固有のフィードバックを使用して反復的にコードを生成し、修正し、選択する。これらのシステムは、数学的発見とアルゴリズム設計において強力な結果をもたらしてきたが、根本的な疑問が残る。プログレスは通常、ランがタスク固有の評価者の下で到達する最良のスコアによって要約されるが、そのスコアは、新しいアルゴリズム構造、既存の戦略の調整、モデルの内部知識にすでに存在するアイデアの再結合、評価者への過度な適合など、いくつかの異なるメカニズムを反映することができる。これらのメカニズムを廃止するには、最終的な結果だけでなく、検索プロセス自体を検査する必要がある。 EvoTraceは4つの進化的フレームワーク、推論と非推論モデル、数学とアルゴリズム設計にまたがる16のタスクからなる進化的コーディングトレースのデータセットである。これらのトレースを解析するために,リプレイベースの手法であるEvoReplayを開発した。これはハイスコアなソリューションの裏にあるローカル検索状態を再構築し,定数の調整やプログラムコンポーネントの削除,モデルの置換,コンテキストのプロンプトといった操作を制御する。私たちはEvoTraceのすべてのコード編集に、盲目の人間の再アノテーションに対して検証されたLSM-as-judgeパイプラインを使用して、9つの繰り返し編集タイプのうちの1つを注釈付けします。 EvoTrace全体では、ほとんどのスコアはこれらの編集タイプの小さなサブセットから得ている。検索中に追加されたコード行の約30%は、削除された行のバイト単位の再導入であり、ほぼすべての実行中に存在する。これらの結果から、進化的符号化エージェントのベンチマークゲインは、定性的に異なるメカニズムから生じることが示され、その一部は新しいアルゴリズム構造に対応するものである。 EvoTraceは、最終ベンチマークスコア以上の進化的コーディングエージェントの診断的評価を可能にする。

論文の概要: What Do Evolutionary Coding Agents Evolve?

関連論文リスト