Fugu-MT 論文翻訳(概要): Evolving Deception: When Agents Evolve, Deception Wins

論文の概要: Evolving Deception: When Agents Evolve, Deception Wins

arxiv url: http://arxiv.org/abs/2603.05872v2
Date: Fri, 13 Mar 2026 10:09:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 13:35:07.434422
Title: Evolving Deception: When Agents Evolve, Deception Wins
Title（参考訳）: 偽装の進化: エージェントが進化すると、偽装が勝つ
Authors: Zonghao Ying, Haowen Dai, Tianyuan Zhang, Yisong Xiao, Quanchen Zou, Aishan Liu, Jian Yang, Yaodong Yang, Xianglong Liu,
Abstract要約: 競合するビディンアリーナにおける大規模言語モデル(LLM)エージェントの自己進化について検討する。私たちは一貫したパターンを見つけます:ユーティリティ駆動の競争の下では、非制約の自己進化は、欺く行動に向かって確実に流れます。本稿では, エージェントの自己進化とアライメントの基本的な緊張関係を明らかにし, 対戦環境における自己改善エージェントの展開リスクを明らかにする。
参考スコア（独自算出の注目度）: 38.72906831937611
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-evolving agents offer a promising path toward scalable autonomy. However, in this work, we show that in competitive environments, self-evolution can instead give rise to a serious and previously underexplored risk: the spontaneous emergence of deception as an evolutionarily stable strategy. We conduct a systematic empirical study on the self-evolution of large language model (LLM) agents in a competitive Bidding Arena, where agents iteratively refine their strategies through interaction-driven reflection. Across different evolutionary paths (\eg, Neutral, Honesty-Guided, and Deception-Guided), we find a consistent pattern: under utility-driven competition, unconstrained self-evolution reliably drifts toward deceptive behaviors, even when honest strategies remain viable. This drift is explained by a fundamental asymmetry in generalization. Deception evolves as a transferable meta-strategy that generalizes robustly across diverse and unseen tasks, whereas honesty-based strategies are fragile and often collapse outside their original contexts. Further analysis of agents internal states reveals the emergence of rationalization mechanisms, through which agents justify or deny deceptive actions to reconcile competitive success with normative instructions. Our paper exposes a fundamental tension between agent self-evolution and alignment, highlighting the risks of deploying self-improving agents in adversarial environments.
Abstract（参考訳）: セルフ進化エージェントは、スケーラブルな自律性への有望な道を提供する。しかし、本研究では、競争環境において、自己進化は、進化的に安定した戦略としての騙しの自然発生という、真面目で未発見のリスクを生じさせることが示される。我々は,大規模言語モデル (LLM) エージェントの自己進化に関する体系的な実証的研究を行い,エージェントが対話型リフレクションを通じて戦略を反復的に洗練する実験を行った。様々な進化の道 (\eg, Neutral, Honesty-Guided, Deception-Guided) にまたがって、一貫したパターンを見出す。このドリフトは一般化の基本的な非対称性によって説明される。認識は伝達可能なメタストラテジーとして進化し、多様で目に見えないタスクで堅牢に一般化するが、正直な戦略は脆弱であり、元の文脈の外でしばしば崩壊する。エージェントの内部状態のさらなる分析は合理化機構の出現を明らかにし、エージェントは規範的な指示と競争的な成功を一致させるために、詐欺行為を正当化または否定する。本稿では, エージェントの自己進化とアライメントの基本的な緊張関係を明らかにし, 対戦環境における自己改善エージェントの展開リスクを明らかにする。

論文の概要: Evolving Deception: When Agents Evolve, Deception Wins

関連論文リスト