Fugu-MT 論文翻訳(概要): Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

論文の概要: Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

arxiv url: http://arxiv.org/abs/2605.30621v1
Date: Thu, 28 May 2026 22:16:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-01 20:56:50.267418
Title: Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
Title（参考訳）: 自己進化型LDM剤の進化能力の遠ざかる
Authors: Minhua Lin, Juncheng Wu, Zijun Wang, Zhan Shi, Yisi Sang, Bing He, Zewen Liu, Tianxin Wei, Zongyu Wu, Zhiwei Zhang, Dakuo Wang, Xiang Zhang, Benoit Dumoulin, Cihang Xie, Yuyin Zhou, Suhang Wang, Hanqing Lu,
Abstract要約: i) ハーネス更新、(i) 実行証拠から有用な永続的ハーネス更新を生成する能力、(ii) ハーネスベネフィット、タスク解決時に更新されたハーネスの恩恵を受ける能力の2つのハーネス自己進化能力を分析した。まず、ハーネス更新は基本能力において平坦である:異なる能力階層のモデルがハーネス更新を生成し、驚くほど類似したゲインをもたらす。第二に、ハーネスベネフィットは基本能力において単調ではない:弱い層モデルは更新されたハーネスからほとんど恩恵を受けず、中層モデルは最も恩恵を受け、強い層モデルは中層より利益が低い。
参考スコア（独自算出の注目度）: 82.27610290890475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM agents are increasingly deployed as systems built around editable external harnesses, including prompts, skills, memories and tools, that shape task execution without changing model parameters. Harness self-evolution adapts such agents by updating these harnesses from execution evidence. Yet it remains unclear whether a model's base capability in task-solving predicts its capabilities in harness self-evolution: which models produce useful harness updates, and which actually benefit from them? We analyze two harness self-evolution capabilities: (i) harness-updating, the capability to produce useful persistent harness updates from execution evidence; (ii) harness-benefit, the capability to benefit from updated harnesses during task solving. Our analysis reveals two findings. First, harness-updating is flat in base capability: models from different capability tiers produce harness updates that lead to surprisingly similar gains; even Qwen3.5-9B's updates yield gains comparable to those of Claude Opus~4.6. Second, harness-benefit is non-monotonic in base capability: weak-tier models benefit little from updated harnesses, mid-tier models benefit most, and strong-tier models benefit less than mid-tier. We trace low gains at the weak tier to two failure modes: weak-tier models may fail to activate relevant harness artifacts, or activate them but fail to follow them faithfully. These findings suggest investing capability budget in the task-solving agent rather than the evolver, and targeting harness invocation and long-horizon instruction following in agent training. Our source code is publicly available at https://github.com/A-EVO-Lab/a-evolve/tree/release/harness-evolution.
Abstract（参考訳）: LLMエージェントは、プロンプト、スキル、記憶、ツールを含む編集可能な外部ハーネスを中心に構築され、モデルパラメータを変更することなくタスクの実行を形作るシステムとして、ますます多くデプロイされている。 Harness self-evolutionは、これらのハーネスを実行証拠から更新することで、そのようなエージェントに適応する。しかし、タスク解決におけるモデルの基本能力が自己進化を利用する能力を予測するかどうかは不明だ。私たちは2つの自己進化能力を分析します。一実行証拠から有用かつ永続的な更新を行う能力、ハーネス更新 (二ハーネス・ベネフィット、タスク解決時に更新ハーネスの恩恵を受ける能力。) 私たちの分析では2つの結果が判明した。 Qwen3.5-9Bの更新でもClaude Opus~4.6に匹敵する利得が得られる。第二に、ハーネスベネフィットは基本能力において単調ではない:弱い層モデルは更新されたハーネスからほとんど恩恵を受けず、中層モデルは最も恩恵を受け、強い層モデルは中層より利益が低い。弱い階層のモデルでは、関連するハーネスアーティファクトをアクティベートしたり、それらをアクティベートしたりはできませんが、忠実にそれらに従うことができません。これらの結果から, 進化型エージェントではなく, タスク解決エージェントへの投資能力の予算化と, エージェント訓練におけるハーネスの実施と長期指導を目標とすることが示唆された。ソースコードはhttps://github.com/A-EVO-Lab/a-evolve/tree/release/harness-evolutionで公開されています。

論文の概要: Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

関連論文リスト