Fugu-MT 論文翻訳(概要): Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

論文の概要: Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

arxiv url: http://arxiv.org/abs/2509.26354v1
Date: Tue, 30 Sep 2025 14:55:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 17:09:04.590458
Title: Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Title（参考訳）: エージェントのミス:自己進化型LSMエージェントの創発的リスク
Authors: Shuai Shao, Qihan Ren, Chen Qian, Boyi Wei, Dadi Guo, Jingyi Yang, Xinhao Song, Linfeng Zhang, Weinan Zhang, Dongrui Liu, Jing Shao,
Abstract要約: エージェントの自己進化が意図しない方法で逸脱し、望ましくない結果や有害な結果に至る場合について検討する。我々の経験から、誤進化は広範囲にわたるリスクであり、最上位のLSM上に構築されたエージェントにも影響を及ぼすことが判明した。我々は、より安全で信頼性の高い自己進化型エージェントを構築するためのさらなる研究を促すための潜在的な緩和戦略について議論する。
参考スコア（独自算出の注目度）: 58.69865074060139
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Advances in Large Language Models (LLMs) have enabled a new class of self-evolving agents that autonomously improve through interaction with the environment, demonstrating strong capabilities. However, self-evolution also introduces novel risks overlooked by current safety research. In this work, we study the case where an agent's self-evolution deviates in unintended ways, leading to undesirable or even harmful outcomes. We refer to this as Misevolution. To provide a systematic investigation, we evaluate misevolution along four key evolutionary pathways: model, memory, tool, and workflow. Our empirical findings reveal that misevolution is a widespread risk, affecting agents built even on top-tier LLMs (e.g., Gemini-2.5-Pro). Different emergent risks are observed in the self-evolutionary process, such as the degradation of safety alignment after memory accumulation, or the unintended introduction of vulnerabilities in tool creation and reuse. To our knowledge, this is the first study to systematically conceptualize misevolution and provide empirical evidence of its occurrence, highlighting an urgent need for new safety paradigms for self-evolving agents. Finally, we discuss potential mitigation strategies to inspire further research on building safer and more trustworthy self-evolving agents. Our code and data are available at https://github.com/ShaoShuai0605/Misevolution . Warning: this paper includes examples that may be offensive or harmful in nature.
Abstract（参考訳）: 大規模言語モデル(LLM)の進歩は、環境との相互作用を通じて自律的に改善し、強力な能力を示す、新たなタイプの自己進化エージェントを可能にした。しかし、自己進化は、現在の安全研究で見過ごされる新たなリスクも引き起こす。本研究では,エージェントの自己進化が意図しない方法で逸脱し,望ましくない結果や有害な結果に至る事例について検討する。これを「誤解」と呼ぶ。系統的な調査のために、モデル、メモリ、ツール、ワークフローの4つの重要な進化経路に沿って、誤進化を評価する。我々の経験的発見は、誤解は広範囲にわたるリスクであり、最上位のLSM(例えばGemini-2.5-Pro)上に構築されたエージェントにも影響を及ぼすことを示している。メモリ蓄積後の安全性アライメントの低下や、ツールの生成と再利用における意図しない脆弱性の導入など、新たなリスクが自己進化のプロセスで観測される。我々の知る限り、この研究は、誤進化を体系的に概念化し、その発生の実証的な証拠を提供する最初の研究であり、自己進化エージェントに対する新たな安全パラダイムの緊急の必要性を浮き彫りにしている。最後に、より安全で信頼性の高い自己進化型エージェントを構築するためのさらなる研究を促すための潜在的な緩和戦略について議論する。私たちのコードとデータはhttps://github.com/ShaoShuai0605/Misevolutionで公開されています。警告:本論文は、自然界で攻撃的または有害な例を含む。

論文の概要: Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

関連論文リスト