Fugu-MT 論文翻訳(概要): Corrupted Learning Dynamics in Games

論文の概要: Corrupted Learning Dynamics in Games

arxiv url: http://arxiv.org/abs/2412.07120v1
Date: Tue, 10 Dec 2024 02:23:44 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-11 22:09:45.194551
Title: Corrupted Learning Dynamics in Games
Title（参考訳）: ゲームにおける破壊的学習ダイナミクス
Authors: Taira Tsuchiya, Shinji Ito, Haipeng Luo,
Abstract要約: ゲームでの学習は、複数のプレイヤーが共有環境で相互作用する問題である。すべてのプレイヤーが非regretアルゴリズムを使用するとき、近似平衡が得られることが知られている。本稿では,各プレイヤーの偏差度に依存する速度で適応的に平衡を求める学習力学について述べる。
参考スコア（独自算出の注目度）: 62.73758165845971
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning in games is the problem where multiple players interact in a shared environment, each aiming to minimize their own regret, and it is known that an approximate equilibrium can be obtained when all players employ no-regret algorithms. Notably, by adopting optimistic follow-the-regularized-leader (OFTRL), the regret of each player after $T$ rounds is constant in two-player zero-sum games, implying that an equilibrium can be computed at a faster rate of $O(1/T)$. However, this acceleration is limited to the honest regime, where all players fully adhere to the given algorithms. To address this limitation, this paper presents corrupted learning dynamics that adaptively find an equilibrium at a rate dependent on the degree of deviation by each player from the given algorithm's output. First, in two-player zero-sum games, we provide learning dynamics where the external regret of the x-player (and similarly for the y-player) in the corrupted regime is roughly bounded by $O(\log (m_\mathrm{x} m_\mathrm{y}) + \sqrt{C_\mathrm{y}} + C_\mathrm{x})$, which implies a convergence rate of $\tilde{O}((\sqrt{C_\mathrm{y}} + C_\mathrm{x})/T)$ to a Nash equilibrium. Here, $m_\mathrm{x}$ and $m_\mathrm{y}$ are the number of actions of the x- and y-players, respectively, and $C_\mathrm{x}$ and $C_\mathrm{y}$ are the cumulative deviations of the x- and y-players from their given algorithms. Furthermore, we extend our approach to multi-player general-sum games, showing that the swap regret of player $i$ in the corrupted regime is bounded by $O(\log T + \sqrt{\sum_j C_j \log T} + C_i)$, where $C_i$ is the cumulative deviations of player $i$ from the given algorithm. This implies a convergence rate of $O((\log T + \sqrt{\sum_j C_j \log T} + C_i)/T)$ to a correlated equilibrium. Our learning dynamics are agnostic to the corruption levels and are based on OFTRL with new adaptive learning rates.
Abstract（参考訳）: ゲームにおける学習は、複数のプレイヤーが共有環境で相互作用する問題であり、それぞれが自分の後悔を最小限に抑えることを目的としており、全てのプレイヤーが非レグレットアルゴリズムを使用する場合、近似均衡が得られることが知られている。特に、楽観的なフォロー・ザ・レギュラライズド・リーダー(OFTRL)を採用することで、T$ラウンド後の各プレイヤーの後悔は2プレイヤーゼロサムゲームでは一定であり、平衡がより速い速度でO(1/T)$で計算できることを意味する。しかし、この加速は全てのプレイヤーが与えられたアルゴリズムに完全に従属する正直な体制に限られる。この制限に対処するため,本稿では,与えられたアルゴリズムの出力から各プレイヤーの偏差度に依存する速度で平衡を適応的に発見する学習力学について述べる。まず、2つのプレイヤーゼロサムゲームにおいて、腐敗した状態におけるx-プレイヤの外部の後悔(およびy-プレイヤについても同様に)が、大まかに$O(\log (m_\mathrm{x} m_\mathrm{y}) + \sqrt{C_\mathrm{y}} + C_\mathrm{x})$で制限される学習力学を提供し、これは$\tilde{O}((\sqrt{C_\mathrm{y}} + C_\mathrm{x})/T)$の収束率を意味する。ここで、$m_\mathrm{x}$と$m_\mathrm{y}$はそれぞれx-とy-プレーヤのアクションの数であり、$C_\mathrm{x}$と$C_\mathrm{y}$は与えられたアルゴリズムからx-とy-プレイヤの累積偏差である。さらに、我々は、マルチプレイヤーの汎用ゲームにアプローチを拡張し、腐敗した状態におけるプレイヤー$i$のスワップ後悔を$O(\log T + \sqrt{\sum_j C_j \log T} + C_i)$で束縛し、そこで、$C_i$はプレイヤー$i$の与えられたアルゴリズムからの累積偏差であることを示す。これは、$O((\log T + \sqrt{\sum_j C_j \log T} + C_i)/T)$の相関平衡への収束率を意味する。我々の学習力学は汚職レベルに非依存であり、新たな適応学習率を持つOFTRLに基づいている。

関連論文リスト

Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
純粋な戦略ナッシュ均衡が存在するとき、$c$ は 0 となり、最適のインスタンス依存後悔境界となることを示す。また,本アルゴリズムは最終段階の収束性も享受し,ほぼ最適サンプルを用いて純粋な戦略ナッシュ均衡を同定することができる。
論文参考訳（メタデータ） (2025-02-24T20:20:06Z)
Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
オンライン学習によるセルフプレイは、大規模な2人プレイのゼロサムゲームを解くための重要な方法の1つだ。我々は,OMWUが支払行列のサイズに対数依存するなど,いくつかの利点があることを示した。我々は、過去のことをすぐに忘れない幅広い種類のアルゴリズムが、すべて同じ問題に悩まされていることを証明している。
論文参考訳（メタデータ） (2024-06-15T13:26:17Z)
The complexity of approximate (coarse) correlated equilibrium for incomplete information games [16.96984593866157]
不完全情報ゲームにおける近似平衡の分散学習の複雑さについて検討する。我々の下限は、$epsilon$-approximate $mathitco$ correlation equilibriumというより簡単な解の概念にも当てはまる。
論文参考訳（メタデータ） (2024-06-04T14:35:27Z)
$\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games [8.215874655947335]
楽観的フォロー・ザ・レギュラライズド・リーダー・アルゴリズムは,フル情報汎用マルコフゲームにおいて,$widetildeO(T-1)$-approximate iterationsを$T$内で見つけることができることを示す。
論文参考訳（メタデータ） (2024-02-02T20:40:27Z)
Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
一般凸およびコンパクト戦略集合に対して後悔が得られることを示す。我々の力学は、適度にエンハンリフトされた空間上の楽観的な従順化バウンドのインスタンス化にある。先行結果が適用される特殊な場合であっても、我々のアルゴリズムは最先端の後悔よりも改善される。
論文参考訳（メタデータ） (2022-06-17T12:58:58Z)
Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games [121.50979258049135]
我々は、すべてのプレイヤーが、時空不変の学習速度で我々のダイナミクスに従うとき、時間$T$までの時空二階パス長は、$O(log T)$で有界であることを示す。提案する学習力学は, 直観的正規化学習と, 自己一致障壁を併用した新しい学習手法である。
論文参考訳（メタデータ） (2022-04-25T03:20:16Z)
Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
本稿では,2プレイヤーゼロサムゲームにおいて,$widetildemathcalO((XA+YB)/varepsilon2)$プレイのエピソードのみを必要とするアルゴリズムの最初の行を,$varepsilon$-approximate Nash平衡を求める。これにより$widetildemathcalO((X2A+Y2B)/varepsilon2)$が$widetildemathcalO(maxX,
論文参考訳（メタデータ） (2022-02-03T18:18:28Z)
When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently? [10.397170312149067]
本稿では,m$-player General-sum Markovゲームの設定において,学習目標がより優れたサンプル複雑性を許容するものについて検討する。まず,$epsilon$-Coarse Correlated Equilibrium (CCE)を$widetildemathcalO(H5Smax_ile m A_i / epsilon2)$ episodesで学習するアルゴリズムを設計する。第二に、マルコフポテンシャルゲームの重要な特別な場合を考え、$widet内で$epsilon$-approximate Nash平衡を学ぶアルゴリズムを設計する。
論文参考訳（メタデータ） (2021-10-08T15:06:22Z)
Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation [92.99933928528797]
同時動作による2プレイヤーゼロサムマルコフゲームの強化学習について検討した。我々は,「不確かさの最適性」に基づくアルゴリズムナッシュ-UCRL-VTRを提案する。我々は、Nash-UCRL-VTR が $tildeO(dHsqrtT)$ regret を確実に達成できることを示し、$d$ は線型関数次元である。
論文参考訳（メタデータ） (2021-02-15T09:09:16Z)
Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
我々は,直接の監督なしに自己対決で最適な政策を学習するセルフプレイアルゴリズムに焦点をあてる。本稿では,サンプル複雑性を$tildemathcalO(SAB)$,サンプル複雑性を$tildemathcalO(S(A+B)$とする新しいemphNash Vラーニングアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-22T05:00:13Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。