Fugu-MT 論文翻訳(概要): CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation

論文の概要: CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation

arxiv url: http://arxiv.org/abs/2604.09746v1
Date: Fri, 10 Apr 2026 06:33:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.650595
Title: CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation
Title（参考訳）: CONSCIENTIA: LLMエージェントは戦略的に学習できるか? マルチエージェントNYCシミュレーションにおける創発的詐欺と信頼
Authors: Aarush Sinha, Arion Das, Soumyadeep Nag, Charan Karnati, Shravani Nag, Chandra Vadhan Raj, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das,
Abstract要約: ニューヨーク市の簡易モデルに大規模なマルチエージェントシミュレーションを導入する。ブルーエージェントは目的地に効率的に到達することを目指しており、レッドエージェントはビルボード重のルートに分岐しようと試みている。隠れたアイデンティティは、ナビゲーションを社会的に介在させ、エージェントにいつ信用するか、あるいは欺くかを判断させる。
参考スコア（独自算出の注目度）: 15.334072037636881
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) are increasingly deployed as autonomous agents, understanding how strategic behavior emerges in multi-agent environments has become an important alignment challenge. We take a neutral empirical stance and construct a controlled environment in which strategic behavior can be directly observed and measured. We introduce a large-scale multi-agent simulation in a simplified model of New York City, where LLM-driven agents interact under opposing incentives. Blue agents aim to reach their destinations efficiently, while Red agents attempt to divert them toward billboard-heavy routes using persuasive language to maximize advertising revenue. Hidden identities make navigation socially mediated, forcing agents to decide when to trust or deceive. We study policy learning through an iterative simulation pipeline that updates agent policies across repeated interaction rounds using Kahneman-Tversky Optimization (KTO). Blue agents are optimized to reduce billboard exposure while preserving navigation efficiency, whereas Red agents adapt to exploit remaining weaknesses. Across iterations, the best Blue policy improves task success from 46.0% to 57.3%, although susceptibility remains high at 70.7%. Later policies exhibit stronger selective cooperation while preserving trajectory efficiency. However, a persistent safety-helpfulness trade-off remains: policies that better resist adversarial steering do not simultaneously maximize task completion. Overall, our results show that LLM agents can exhibit limited strategic behavior, including selective trust and deception, while remaining highly vulnerable to adversarial persuasion.
Abstract（参考訳）: 大規模言語モデル(LLM)が自律エージェントとしてますますデプロイされるにつれて、マルチエージェント環境での戦略的行動がどのように出現するかを理解することが重要なアライメント課題となっている。我々は中立的な経験的姿勢をとり、戦略的行動を直接観察し測定できる制御された環境を構築する。 LLMを駆動するエージェントが反対のインセンティブの下で相互作用する、ニューヨーク市の簡易モデルにおいて、大規模なマルチエージェントシミュレーションを導入する。ブルーエージェントは目的地に効率的に到達することを目指しており、レッドエージェントは広告収入を最大化するために説得力のある言語を使って、看板を多用するルートに分岐させようとしている。隠れたアイデンティティは、ナビゲーションを社会的に介在させ、エージェントにいつ信用するか、あるいは欺くかを判断させる。我々は,KTO (Kahneman-Tversky Optimization) を用いて,反復的シミュレーションパイプラインによるポリシー学習について検討した。青色のエージェントはナビゲーション効率を保ちながら看板の露出を減らすように最適化されており、赤色のエージェントは残った弱点を利用するように適応している。イテレーションを通じて、最高のブルーポリシーはタスクの成功率を46.0%から57.3%に改善するが、感受性は70.7%である。後の政策は、軌道効率を維持しながらより強い選択的な協調を示す。敵の操舵に抵抗する政策は同時にタスク完了を最大化しない。以上の結果より, LLM エージェントは選択的信頼や騙しなど, 限られた戦略行動を示しつつも, 敵対的説得に対して脆弱であることが明らかとなった。

論文の概要: CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation

関連論文リスト