Fugu-MT 論文翻訳(概要): Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks

論文の概要: Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks

arxiv url: http://arxiv.org/abs/2509.06701v1
Date: Mon, 08 Sep 2025 13:55:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:04.17413
Title: Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks
Title（参考訳）: ディープニューラルネットワークにおける潜在エージェントサブ構造の確率論的モデリング
Authors: Su Hyeong Lee, Risi Kondor, Richard Ngo,
Abstract要約: ニューラルモデルに対する確率論的モデリングに基づくインテリジェントエージェンシーの理論を開発する。線形プールや連立結果空間では厳密な一様性は不可能であるが、3つ以上の結果が得られる。
参考スコア（独自算出の注目度）: 7.4145864319417285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We develop a theory of intelligent agency grounded in probabilistic modeling for neural models. Agents are represented as outcome distributions with epistemic utility given by log score, and compositions are defined through weighted logarithmic pooling that strictly improves every member's welfare. We prove that strict unanimity is impossible under linear pooling or in binary outcome spaces, but possible with three or more outcomes. Our framework admits recursive structure via cloning invariance, continuity, and openness, while tilt-based analysis rules out trivial duplication. Finally, we formalize an agentic alignment phenomenon in LLMs using our theory: eliciting a benevolent persona ("Luigi'") induces an antagonistic counterpart ("Waluigi"), while a manifest-then-suppress Waluigi strategy yields strictly larger first-order misalignment reduction than pure Luigi reinforcement alone. These results clarify how developing a principled mathematical framework for how subagents can coalesce into coherent higher-level entities provides novel implications for alignment in agentic AI systems.
Abstract（参考訳）: ニューラルモデルに対する確率論的モデリングに基づくインテリジェントエージェンシーの理論を開発する。エージェントは、ログスコアによって与えられるてんかんの効用を持つ結果分布として表現され、構成は、すべてのメンバーの福祉を厳密に改善する重み付き対数プールによって定義される。線形プールや連立結果空間では厳密な一様性は不可能であるが、3つ以上の結果が得られる。我々のフレームワークは、クローン不変性、連続性、開放性を通じて再帰的構造を認め、傾きに基づく解析は自明な重複を除外する。最後に, LLMにおけるエージェントアライメント現象を, 我々の理論を用いて定式化した: 善良な人物(「ルイージ」)を誘引することで, 対角的相手(「ワルイジ」)を誘導し, 一方, 顕在的に抑圧されたヴァルイージ戦略は, 純粋なルイージ強化単独よりも厳密な一階不整合を減少させる。これらの結果は,エージェントAIシステムにおけるアライメントに新たな意味を与えるために,サブエージェントがコヒーレントな高レベルのエンティティに結合する方法についての,基本的な数学的枠組みの開発方法を明らかにする。

論文の概要: Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks

関連論文リスト