Fugu-MT 論文翻訳(概要): AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering

論文の概要: AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering

arxiv url: http://arxiv.org/abs/2601.04620v1
Date: Thu, 08 Jan 2026 05:49:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 17:01:53.059294
Title: AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering
Title（参考訳）: AgentDevel: リリースエンジニアリングとしての自己進化型LDMエージェント
Authors: Di Zhang,
Abstract要約: AgentDevelは、現行のエージェントを反復的に実行するリリースエンジニアリングパイプラインである。実行トレースから実装盲の症状レベルの品質信号を生成する。主要な症状パターンを集約し、監査可能なエンジニアリング仕様を生成する。
参考スコア（独自算出の注目度）: 8.201374511929538
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in large language model (LLM) agents has largely focused on embedding self-improvement mechanisms inside the agent or searching over many concurrent variants. While these approaches can raise aggregate scores, they often yield unstable and hard-to-audit improvement trajectories, making it difficult to guarantee non-regression or to reason about failures across versions. We reframe agent improvement as \textbf{release engineering}: agents are treated as shippable artifacts, and improvement is externalized into a regression-aware release pipeline. We introduce \textbf{AgentDevel}, a release engineering pipeline that iteratively runs the current agent, produces implementation-blind, symptom-level quality signals from execution traces, synthesizes a single release candidate (RC) via executable diagnosis, and promotes it under flip-centered gating. AgentDevel features three core designs: (i) an implementation-blind LLM critic that characterizes failure appearances without accessing agent internals, (ii) script-based executable diagnosis that aggregates dominant symptom patterns and produces auditable engineering specifications, and (iii) flip-centered gating that prioritizes pass to fail regressions and fail to pass fixes as first-class evidence. Unlike population-based search or in-agent self-refinement, AgentDevel maintains a single canonical version line and emphasizes non-regression as a primary objective. Experiments on execution-heavy benchmarks demonstrate that AgentDevel yields stable improvements with significantly fewer regressions while producing reproducible, auditable artifacts. Overall, AgentDevel provides a practical development discipline for building, debugging, and releasing LLM agents as software development.
Abstract（参考訳）: 大規模言語モデル (LLM) エージェントの最近の進歩は、エージェント内に自己改善機構を組み込んだり、複数の同時変異を検索することに集中している。これらのアプローチは総合的なスコアを上げることができるが、不安定で監査の難しい改善軌道を産み出すことが多く、非回帰を保証することや、バージョン間の失敗を推論することは困難である。エージェントは出荷可能なアーティファクトとして扱われ、改善は回帰対応リリースパイプラインに外部化されます。本稿では,現行エージェントを反復的に動作させるリリースエンジニアリングパイプラインである‘textbf{AgentDevel}を導入し,実装ブレンドで症状レベルの高い信号を実行トレースから生成し,実行可能な診断を通じて単一リリース候補(RC)を合成し,フリップ中心のゲーティングで促進する。 AgentDevelには3つのコアデザインがある。 (i)エージェント内部にアクセスせずに障害の出現を特徴付ける実装盲検のLCM批評家。二主要な症状パターンを集約し、監査可能な工学仕様を作成するスクリプトベースの実行可能な診断三優先するリグレッションの失敗を優先し、第一級の証拠として修正をパスしないフリップ中心のゲーティング。人口ベースの検索やインエイジェント・セルフリファインメントとは異なり、AgentDevelは単一の標準バージョンラインを維持し、主要な目的として非回帰を強調している。実行重ベンチマークの実験では、AgentDevelは再現性のある監査可能なアーティファクトを生成しながら、非常に少ないレグレッションで安定した改善をもたらすことが示されている。全体として、AgentDevelはLLMエージェントをソフトウェア開発として構築、デバッグ、リリースするための実践的な開発規律を提供します。

論文の概要: AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering

関連論文リスト