Fugu-MT 論文翻訳(概要): Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion

論文の概要: Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion

arxiv url: http://arxiv.org/abs/2606.22226v1
Date: Sat, 20 Jun 2026 20:59:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 21:57:35.806067
Title: Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion
Title（参考訳）: 理論的AIアライメント保証の定量化:ベイジアン説得における受信者の実用性の境界
Authors: Eric Yachbes, Eva Tardos,
Abstract要約: ミスアライメントは、情報がAIエージェントから人間のユーザへどのように移動するかを変えることができる。私たちはこれを情報アドバンテージとしてモデル化します。戦略的なAI送信者は、人間の決定を下すために証拠や粗末な情報を保持することができる。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Misalignment can change how information moves from an AI agent to a human user. We model this as an information advantage: the AI agent observes the world state, while the human receiver only knows a prior and must act after seeing the agent's signal. A strategic AI sender may withhold evidence or garble information in order to steer the human's decision. We ask how much useful information can still reach the human when the AI optimizes a misaligned objective. We study a Bayesian persuasion model in which the world state is a bit string, the human receiver wants to guess the bits correctly, and a single AI sender wants the receiver to guess as many bits as possible as $1$. For a prior $μ$, let $R_0(μ)$ be the receiver's utility from using only the prior, and let $R_{\max}(μ)$ be the largest receiver utility among signaling schemes that are optimal for the sender. We prove $R_{\max}(μ)/R_0(μ)\leq 3/2$. This bound improves for priors close to the independent product prior with the same marginals: if $μ(x)\geq (1-η)π_μ(x)$ for every state $x$, then $R_{\max}(μ)\leq R_0(μ)+ηn$. We also give a six-bit prior for which $R_{\max}(μ)/R_0(μ)=39/31>5/4$, so no universal $5/4$ bound is possible.
Abstract（参考訳）: ミスアライメントは、情報がAIエージェントから人間のユーザへどのように移動するかを変えることができる。我々は、これを情報優位性としてモデル化する:AIエージェントは世界の状態を観察するが、人間の受信機は、エージェントの信号を見た後に行動しなければならない。戦略的なAI送信者は、人間の決定を下すために証拠や粗末な情報を保持することができる。 AIが不一致の目的を最適化するとき、どれほど有用な情報が人間に届くか尋ねる。我々は、世界状態がビット文字列であり、人間の受信機がビットを正しく推測したいというベイズ的説得モデルを研究し、単一のAI送信機が受信機に可能な限り多くのビットを推測することを望んでいる。以前の$μ$に対して、$R_0(μ)$は、前者だけの使用から受信機のユーティリティとし、$R_{\max}(μ)$は、送信者にとって最適なシグナリングスキームの中で最大の受信機ユーティリティとする。 R_{\max}(μ)/R_0(μ)\leq 3/2$。この境界は、同じ限界を持つ独立積に先行して改善される: if $μ(x)\geq (1-η)π_μ(x)$ for every state $x$, then $R_{\max}(μ)\leq R_0(μ)+ηn$。また、R_{\max}(μ)/R_0(μ)=39/31>5/4$の6ビット前値も与えているので、普遍的な5/4$バウンドは不可能である。

論文の概要: Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion

関連論文リスト