Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games
- URL: http://arxiv.org/abs/2111.06008v1
- Date: Thu, 11 Nov 2021 01:19:53 GMT
- Title: Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games
- Authors: Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina,
Maxwell Fishelson, Noah Golowich, Tuomas Sandholm
- Abstract summary: We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
- Score: 104.74734408204749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that
if all agents in a multi-player general-sum normal-form game employ Optimistic
Multiplicative Weights Update (OMWU), the external regret of every player is
$O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their
result from external regret to internal regret and swap regret, thereby
establishing uncoupled learning dynamics that converge to an approximate
correlated equilibrium at the rate of $\tilde{O}(T^{-1})$. This substantially
improves over the prior best rate of convergence for correlated equilibria of
$O(T^{-3/4})$ due to Chen and Peng (NeurIPS`20), and it is optimal -- within
the no-regret framework -- up to polylogarithmic factors in $T$.
To obtain these results, we develop new techniques for establishing
higher-order smoothness for learning dynamics involving fixed point operations.
Specifically, we establish that the no-internal-regret learning dynamics of
Stoltz and Lugosi (Mach Learn`05) are equivalently simulated by
no-external-regret dynamics on a combinatorial space. This allows us to trade
the computation of the stationary distribution on a polynomial-sized Markov
chain for a (much more well-behaved) linear transformation on an
exponential-sized set, enabling us to leverage similar techniques as DGF to
near-optimally bound the internal regret.
Moreover, we establish an $O(\textrm{polylog}(T))$ no-swap-regret bound for
the classic algorithm of Blum and Mansour (BM) (JMLR`07). We do so by
introducing a technique based on the Cauchy Integral Formula that circumvents
the more limited combinatorial arguments of DFG. In addition to shedding
clarity on the near-optimal regret guarantees of BM, our arguments provide
insights into the various ways in which the techniques by DFG can be extended
and leveraged in the analysis of more involved learning algorithms.
Related papers
- $\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games [8.215874655947335]
We show that an optimistic-follow-the-regularized-leader algorithm can find $widetildeO(T-1)$-approximate iterations in full-information general-sum Markov games within $T$.
arXiv Detail & Related papers (2024-02-02T20:40:27Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games [85.78272987312343]
We establish efficient and uncoupled learning dynamics so that the trigger regret of each player grows as $O(log T)$ after $T$ repetitions of play.
This improves exponentially over the prior best known trigger-regret bound of $O(T1/4)$.
arXiv Detail & Related papers (2022-08-20T20:48:58Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.