Fugu-MT 論文翻訳(概要): Learning to Lead: Incentivizing Strategic Agents in the Dark

論文の概要: Learning to Lead: Incentivizing Strategic Agents in the Dark

arxiv url: http://arxiv.org/abs/2506.08438v1
Date: Tue, 10 Jun 2025 04:25:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-11 15:11:41.465563
Title: Learning to Lead: Incentivizing Strategic Agents in the Dark
Title（参考訳）: リーダーになるための学習:暗闇の中で戦略エージェントにインセンティブを与える
Authors: Yuchen Wu, Xinyi Zhong, Zhuoran Yang,
Abstract要約: 一般化プリンシパルエージェントモデルのオンライン学習バージョンについて検討する。この挑戦的な設定のための最初の証明可能なサンプル効率アルゴリズムを開発した。我々は、プリンシパルの最適ポリシーを学ぶために、ほぼ最適な $tildeO(sqrtT) $ regret bound を確立する。
参考スコア（独自算出の注目度）: 50.93875404941184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study an online learning version of the generalized principal-agent model, where a principal interacts repeatedly with a strategic agent possessing private types, private rewards, and taking unobservable actions. The agent is non-myopic, optimizing a discounted sum of future rewards and may strategically misreport types to manipulate the principal's learning. The principal, observing only her own realized rewards and the agent's reported types, aims to learn an optimal coordination mechanism that minimizes strategic regret. We develop the first provably sample-efficient algorithm for this challenging setting. Our approach features a novel pipeline that combines (i) a delaying mechanism to incentivize approximately myopic agent behavior, (ii) an innovative reward angle estimation framework that uses sector tests and a matching procedure to recover type-dependent reward functions, and (iii) a pessimistic-optimistic LinUCB algorithm that enables the principal to explore efficiently while respecting the agent's incentive constraints. We establish a near optimal $\tilde{O}(\sqrt{T}) $ regret bound for learning the principal's optimal policy, where $\tilde{O}(\cdot) $ omits logarithmic factors. Our results open up new avenues for designing robust online learning algorithms for a wide range of game-theoretic settings involving private types and strategic agents.
Abstract（参考訳）: 一般化された主エージェントモデルのオンライン学習バージョンについて検討し、主役がプライベートタイプ、個人報酬、観察不能な行動をとる戦略的エージェントと繰り返し対話する。エージェントは非筋電図であり、将来の報酬の割引額を最適化し、プリンシパルの学習を操作するために戦略的に型を誤レポートする可能性がある。プリンシパルは、自分自身が実現した報酬と、エージェントの報告されたタイプだけを観察し、戦略的後悔を最小限に抑える最適な調整メカニズムを学ぶことを目的としている。この挑戦的な設定のための最初の証明可能なサンプル効率アルゴリズムを開発した。私たちのアプローチは、組み合わせた新しいパイプラインを特徴としている。一約筋萎縮剤の作用を刺激する遅延機構二セクター試験及び整合手順を用いて型依存報酬関数を復元する革新的報酬角推定フレームワーク三エージェントのインセンティブ制約を尊重しながら、プリンシパルが効率的に探索できる悲観的最適化LinUCBアルゴリズム。我々は、プリンシパルの最適ポリシーを学ぶために、ほぼ最適な $\tilde{O}(\sqrt{T}) $ regret bound を確立し、$\tilde{O}(\cdot) $ omits logarithmic factors。本研究は,個人型と戦略エージェントを含む多種多様なゲーム理論設定のために,堅牢なオンライン学習アルゴリズムを設計するための新たな道を開くものである。

論文の概要: Learning to Lead: Incentivizing Strategic Agents in the Dark

関連論文リスト