Fugu-MT 論文翻訳(概要): Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

論文の概要: Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

arxiv url: http://arxiv.org/abs/2111.06008v1
Date: Thu, 11 Nov 2021 01:19:53 GMT
ステータス: 翻訳完了
システム内更新日: 2021-11-12 14:59:36.105990
Title: Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games
Title（参考訳）: マルチプレイヤー・ジェネラルサムゲームにおける相関平衡の近似的no-regret学習
Authors: Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm
Abstract要約: マルチプレイヤーの汎用正規形式ゲームにおいて,OMWU(Optimistic Multiplicative Weights Update)を用いているエージェントが全員,O(textrmpolylog(T))$(T$)$(T$)$(OMWU)$(OMWU)$(OMWU)$(OMWU)$(OMWU)$)であることを示す。外部の後悔から内部の後悔へと結果を拡張し、後悔を交換することで、近似した平衡に収束する非結合学習ダイナミクスを確立する。
参考スコア（独自算出の注目度）: 104.74734408204749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of $\tilde{O}(T^{-1})$. This substantially improves over the prior best rate of convergence for correlated equilibria of $O(T^{-3/4})$ due to Chen and Peng (NeurIPS`20), and it is optimal -- within the no-regret framework -- up to polylogarithmic factors in $T$. To obtain these results, we develop new techniques for establishing higher-order smoothness for learning dynamics involving fixed point operations. Specifically, we establish that the no-internal-regret learning dynamics of Stoltz and Lugosi (Mach Learn`05) are equivalently simulated by no-external-regret dynamics on a combinatorial space. This allows us to trade the computation of the stationary distribution on a polynomial-sized Markov chain for a (much more well-behaved) linear transformation on an exponential-sized set, enabling us to leverage similar techniques as DGF to near-optimally bound the internal regret. Moreover, we establish an $O(\textrm{polylog}(T))$ no-swap-regret bound for the classic algorithm of Blum and Mansour (BM) (JMLR`07). We do so by introducing a technique based on the Cauchy Integral Formula that circumvents the more limited combinatorial arguments of DFG. In addition to shedding clarity on the near-optimal regret guarantees of BM, our arguments provide insights into the various ways in which the techniques by DFG can be extended and leveraged in the analysis of more involved learning algorithms.
Abstract（参考訳）: 最近、Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) は、マルチプレイヤーの汎用正規形式ゲームにおける全てのエージェントがOptimistic Multiplicative Weights Update (OMWU) を使用している場合、全てのプレイヤーの外部後悔は、ゲームの繰り返しの後で$O(\textrm{polylog}(T)$であることを示した。その結果を外部の後悔から内部後悔に拡張し,後悔を交換することで,約相関平衡に収束する非結合学習ダイナミクスを,$\tilde{o}(t^{-1})$で確立する。これは、Chen and Peng (NeurIPS`20) による$O(T^{-3/4})$の相関平衡に対する事前の収束率よりも大幅に改善され、非回帰フレームワーク内では、$T$のポリ対数因子まで最適である。これらの結果を得るために,不動点演算を含む学習ダイナミクスの高次滑らか性を確立するための新しい手法を開発した。具体的には、stltz と lugosi (mach learn`05) の非内在回帰学習ダイナミクスは、組合せ空間上の非外在回帰ダイナミクスによって等価にシミュレートされる。これにより、多項式サイズのマルコフ連鎖上の定常分布の計算を指数関数型集合上の(かなりよく考えられた)線型変換と交換することができ、dgfのような類似の手法を内部後悔をほぼ最適に束ねることができる。さらに、Blum and Mansour (BM) (JMLR`07) の古典的アルゴリズムに対して、$O(\textrm{polylog}(T))$ no-swap-regret を定めている。我々は,dfg のより限定的な組合せ論を回避し,コーシー積分式に基づく手法を導入する。本論は,BMのほぼ最適後悔の保証に対する明瞭さの隠蔽に加えて,DFGによる手法を拡張し,より関連する学習アルゴリズムの分析に活用する様々な方法についての知見を提供する。

論文の概要: Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

関連論文リスト