Fugu-MT 論文翻訳(概要): Corrupted Learning Dynamics in Games

論文の概要: Corrupted Learning Dynamics in Games

arxiv url: http://arxiv.org/abs/2412.07120v2
Date: Sun, 16 Feb 2025 19:18:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-18 16:13:23.212026
Title: Corrupted Learning Dynamics in Games
Title（参考訳）: ゲームにおける破壊的学習ダイナミクス
Authors: Taira Tsuchiya, Shinji Ito, Haipeng Luo,
Abstract要約: すべてのプレイヤーが楽観的な追従型リーダー(OFTRL)に従うと、平衡は$O(log T)$の速さで計算できる。本稿では,各プレイヤーが所定のアルゴリズムによって提案される戦略から逸脱する程度に依存する速度で,適応的に平衡を求める学習ダイナミクスを提案する。
参考スコア（独自算出の注目度）: 62.73758165845971
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning in games refers to scenarios where multiple players interact in a shared environment, each aiming to minimize their regret. An equilibrium can be computed at a fast rate of $O(1/T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL). However, this acceleration is limited to the honest regime, in which all players adhere to a prescribed algorithm -- a situation that may not be realistic in practice. To address this issue, we present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm. First, in two-player zero-sum corrupted games, we provide learning dynamics for which the external regret of $x$-player (and similarly for $y$-player) is roughly bounded by $O(\log (m_x m_y) + \sqrt{\hat{C}_y} + \hat{C}_x)$, where $m_x$ and $m_y$ denote the number of actions of $x$- and $y$-players, respectively, and $\hat{C}_x$ and $\hat{C}_y$ represent their cumulative deviations. We then extend our approach to multi-player general-sum corrupted games, providing learning dynamics for which the swap regret of player $i$ is bounded by $O(\log T + \sqrt{\sum_{k} \hat{C}_k \log T} + \hat{C}_i)$ ignoring dependence on the number of players and actions, where $\hat{C}_i$ is the cumulative deviation of player $i$ from the prescribed algorithm. Our learning dynamics are agnostic to the levels of corruption. A key technical contribution is a new analysis that ensures the stability of a Markov chain under a new adaptive learning rate, thereby allowing us to achieve the desired bound in the corrupted regime while matching the best existing bound in the honest regime. Notably, our framework can be extended to address not only corruption in strategies but also corruption in the observed expected utilities, and we provide several matching lower bounds.
Abstract（参考訳）: ゲームでの学習は、複数のプレイヤーが共有環境で相互作用し、それぞれが後悔を最小限に抑えるシナリオを指す。すべてのプレイヤーが楽観的な追従型リーダー(OFTRL)に従うと、平衡は$O(1/T)$で計算できる。しかし、この加速は、すべてのプレイヤーが所定のアルゴリズムに従属する誠実な体制に限られる。この問題に対処するために,各プレイヤーが所定のアルゴリズムによって提案される戦略から逸脱する程度に依存する速度で,均衡を適応的に見つける学習ダイナミクスを提案する。まず、2人のプレイヤーのゼロサム不正なゲームでは、$x$-player(および$y$-player)の外部後悔が大まかに$O(\log (m_x m_y) + \sqrt{\hat{C}_y} + \hat{C}_x)$、$m_x$と$m_y$はそれぞれ$x$-と$y$-playersのアクションの数を表し、$\hat{C}_x$と$\hat{C}_y$は累積偏差を表す学習力学を提供する。プレイヤー $i$ のスワップ後悔が $O(\log T + \sqrt{\sum_{k} \hat{C}_k \log T} + \hat{C}_i)$ プレイヤー数やアクションへの依存を無視し、$\hat{C}_i$ が所定のアルゴリズムからプレイヤー $i$ を累積逸脱する学習力学を提供する。私たちの学習力学は汚職のレベルに非依存です。重要な技術的貢献は、新しい適応学習率の下でマルコフ連鎖の安定性を保証する新しい分析であり、これにより、誠実な体制における最良の既存の境界に適合しながら、腐敗した体制における望ましい境界を達成することができる。特に、我々のフレームワークは、戦略の腐敗だけでなく、観測された期待されたユーティリティの腐敗にも対処できる。

関連論文リスト

Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
純粋な戦略ナッシュ均衡が存在するとき、$c$ は 0 となり、最適のインスタンス依存後悔境界となることを示す。また,本アルゴリズムは最終段階の収束性も享受し,ほぼ最適サンプルを用いて純粋な戦略ナッシュ均衡を同定することができる。
論文参考訳（メタデータ） (2025-02-24T20:20:06Z)
Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
オンライン学習によるセルフプレイは、大規模な2人プレイのゼロサムゲームを解くための重要な方法の1つだ。我々は,OMWUが支払行列のサイズに対数依存するなど,いくつかの利点があることを示した。我々は、過去のことをすぐに忘れない幅広い種類のアルゴリズムが、すべて同じ問題に悩まされていることを証明している。
論文参考訳（メタデータ） (2024-06-15T13:26:17Z)
The complexity of approximate (coarse) correlated equilibrium for incomplete information games [16.96984593866157]
不完全情報ゲームにおける近似平衡の分散学習の複雑さについて検討する。我々の下限は、$epsilon$-approximate $mathitco$ correlation equilibriumというより簡単な解の概念にも当てはまる。
論文参考訳（メタデータ） (2024-06-04T14:35:27Z)
$\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games [8.215874655947335]
楽観的フォロー・ザ・レギュラライズド・リーダー・アルゴリズムは,フル情報汎用マルコフゲームにおいて,$widetildeO(T-1)$-approximate iterationsを$T$内で見つけることができることを示す。
論文参考訳（メタデータ） (2024-02-02T20:40:27Z)
Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
一般凸およびコンパクト戦略集合に対して後悔が得られることを示す。我々の力学は、適度にエンハンリフトされた空間上の楽観的な従順化バウンドのインスタンス化にある。先行結果が適用される特殊な場合であっても、我々のアルゴリズムは最先端の後悔よりも改善される。
論文参考訳（メタデータ） (2022-06-17T12:58:58Z)
Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games [121.50979258049135]
我々は、すべてのプレイヤーが、時空不変の学習速度で我々のダイナミクスに従うとき、時間$T$までの時空二階パス長は、$O(log T)$で有界であることを示す。提案する学習力学は, 直観的正規化学習と, 自己一致障壁を併用した新しい学習手法である。
論文参考訳（メタデータ） (2022-04-25T03:20:16Z)
Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
本稿では,2プレイヤーゼロサムゲームにおいて,$widetildemathcalO((XA+YB)/varepsilon2)$プレイのエピソードのみを必要とするアルゴリズムの最初の行を,$varepsilon$-approximate Nash平衡を求める。これにより$widetildemathcalO((X2A+Y2B)/varepsilon2)$が$widetildemathcalO(maxX,
論文参考訳（メタデータ） (2022-02-03T18:18:28Z)
When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently? [10.397170312149067]
本稿では,m$-player General-sum Markovゲームの設定において,学習目標がより優れたサンプル複雑性を許容するものについて検討する。まず,$epsilon$-Coarse Correlated Equilibrium (CCE)を$widetildemathcalO(H5Smax_ile m A_i / epsilon2)$ episodesで学習するアルゴリズムを設計する。第二に、マルコフポテンシャルゲームの重要な特別な場合を考え、$widet内で$epsilon$-approximate Nash平衡を学ぶアルゴリズムを設計する。
論文参考訳（メタデータ） (2021-10-08T15:06:22Z)
Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
我々は,直接の監督なしに自己対決で最適な政策を学習するセルフプレイアルゴリズムに焦点をあてる。本稿では,サンプル複雑性を$tildemathcalO(SAB)$,サンプル複雑性を$tildemathcalO(S(A+B)$とする新しいemphNash Vラーニングアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-22T05:00:13Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。