Near-Optimal No-Regret Learning in General Games
- URL: http://arxiv.org/abs/2108.06924v1
- Date: Mon, 16 Aug 2021 06:53:02 GMT
- Title: Near-Optimal No-Regret Learning in General Games
- Authors: Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich
- Abstract summary: We show that Optimistic Hedge attains $rm poly(log T)$ regret in multi-player general-sum games.
A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $tildeOleft(frac 1Tright)$.
- Score: 31.293412884145066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that Optimistic Hedge -- a common variant of
multiplicative-weights-updates with recency bias -- attains ${\rm poly}(\log
T)$ regret in multi-player general-sum games. In particular, when every player
of the game uses Optimistic Hedge to iteratively update her strategy in
response to the history of play so far, then after $T$ rounds of interaction,
each player experiences total regret that is ${\rm poly}(\log T)$. Our bound
improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard
no-regret learners in games, the $O(T^{1/4})$ regret attainable by no-regret
learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$
bound that was recently shown for Optimistic Hedge in the special case of
two-player games (Chen & Pen, 2020). A corollary of our bound is that
Optimistic Hedge converges to coarse correlated equilibrium in general games at
a rate of $\tilde{O}\left(\frac 1T\right)$.
Related papers
- Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games.
We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix.
We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z) - $\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games [8.215874655947335]
We show that an optimistic-follow-the-regularized-leader algorithm can find $widetildeO(T-1)$-approximate iterations in full-information general-sum Markov games within $T$.
arXiv Detail & Related papers (2024-02-02T20:40:27Z) - Last-Iterate Convergence Properties of Regret-Matching Algorithms in
Games [72.43065138581589]
We study the last-iterate convergence properties of various popular variants of RM$+$.
We show numerically that several practical variants such as simultaneous RM$+$, alternating RM$+$, and predictive RM$+$, all lack last-iterate convergence guarantees.
We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence.
arXiv Detail & Related papers (2023-11-01T17:34:58Z) - Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents.
We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents.
We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z) - Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games [85.78272987312343]
We establish efficient and uncoupled learning dynamics so that the trigger regret of each player grows as $O(log T)$ after $T$ repetitions of play.
This improves exponentially over the prior best known trigger-regret bound of $O(T1/4)$.
arXiv Detail & Related papers (2022-08-20T20:48:58Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer
Games [121.50979258049135]
We show that when all players follow our dynamics with a emphtime-invariant learning rate, the emphsecond-order path lengths of the dynamics up to time $T$ are bounded by $O(log T)$.
Our proposed learning dynamics combine in a novel way emphoptimistic regularized learning with the use of emphself-concordant barriers.
arXiv Detail & Related papers (2022-04-25T03:20:16Z) - Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.