Fugu-MT 論文翻訳(概要): Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems

論文の概要: Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems

arxiv url: http://arxiv.org/abs/2606.06114v2
Date: Sat, 06 Jun 2026 08:23:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 12:24:31.354453
Title: Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems
Title（参考訳）: 健康進化に向けて : 自己進化システムにおける人間とエージェントの相互作用の役割とメカニズムを探る
Authors: Dianxing Shi, Bowen Wang, Junqi He, Junhao Chen, Yuta Nakashima,
Abstract要約: 自己進化エージェントは、連続的な自己再生と自己生成学習信号によって改善される。 Human-like Oversight and Review (ANCHOR)は、人間の監督をシミュレートし、自己進化のさまざまなフェーズでフィードバックを提供する。
参考スコア（独自算出の注目度）: 30.399085963137836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-evolving agents improve through continual self-play and self-generated learning signals, but autonomous evolution can also cause capability degradation and safety drift. Although human feedback has proven effective for static and post-trained agents, its role in self-evolving systems remains underexplored. We introduce Agent Norm Correction through Human-like Oversight and Review (ANCHOR), an LLM-based framework that simulates human supervision and delivers feedback at various phases of self-evolution. With ANCHOR, we evaluate two representative open-source self-evolving agent systems across coding, mathematical reasoning, and safety. Our results show that even limited supervision substantially mitigates safety degradation while preserving stable performance on core evolutionary objectives. Further analysis shows that supervision over the output verification phase is the most effective for intervention, whereas increasing supervision frequency yields diminishing returns. These findings provide empirical evidence and practical guidance for designing more stable, controllable, and human-aligned self-evolving agent systems.
Abstract（参考訳）: 自己進化エージェントは、連続的な自己再生と自己生成学習信号によって改善されるが、自律進化は能力劣化や安全性の低下を引き起こす。人間のフィードバックは静的およびポストトレーニングされたエージェントに有効であることが証明されているが、自己進化システムにおけるその役割は未解明のままである。我々は、人間の監督をシミュレートし、自己進化の様々なフェーズでフィードバックを提供するLLMベースのフレームワークであるHuman-like Oversight and Review (ANCHOR)によるエージェントノルム補正を紹介する。 ANCHORを用いて、符号化、数学的推論、安全性の2つの代表的なオープンソース自己進化エージェントシステムを評価する。本研究は, 限定的な監視さえも, コア進化目標に対する安定的な性能を維持しつつ, 安全性の低下を著しく軽減することを示した。さらに分析したところ、出力検証フェーズの監視は介入に最も効果的である一方、監視周波数の増加はリターンを減少させることがわかった。これらの知見は、より安定し、制御可能で、かつ、人間に適応した自己進化型エージェントシステムを設計するための実証的な証拠と実践的なガイダンスを提供する。

論文の概要: Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems

関連論文リスト