Related papers: Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

URL: http://arxiv.org/abs/2204.11417v1
Date: Mon, 25 Apr 2022 03:20:16 GMT
Title: Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games
Authors: Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm
Abstract summary: We show that when all players follow our dynamics with a emphtime-invariant learning rate, the emphsecond-order path lengths of the dynamics up to time $T$ are bounded by $O(log T)$. Our proposed learning dynamics combine in a novel way emphoptimistic regularized learning with the use of emphself-concordant barriers.
Score: 121.50979258049135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$. At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a \emph{time-invariant} learning rate, the \emph{second-order path lengths} of the dynamics up to time $T$ are bounded by $O(\log T)$, a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way \emph{optimistic} regularized learning with the use of \emph{self-concordant barriers}. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).

Related papers

Faster Rates for No-Regret Learning in General Games via Cautious Optimism [46.462843198107144]
We establish the first uncoupled learning algorithm that attains $O(n, d log T)$ per-player regret in multi-player general-sum games. Our results exponentially improve the dependence on $d$ compared to the $O(n, d log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL.
arXiv Detail & Related papers (2025-03-31T17:25:33Z)
Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound. Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z)
Corrupted Learning Dynamics in Games [62.73758165845971]
An equilibrium can be computed at a fast rate of $O(log T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL) We present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm.
arXiv Detail & Related papers (2024-12-10T02:23:44Z)
Non-stationary Projection-free Online Learning with Dynamic and Adaptive Regret Guarantees [36.746745619968024]
We investigate non-stationary projection-free online learning, and choose dynamic regret and adaptive regret to measure the performance. Our results are the first general-case dynamic regret bounds for projection-free online learning, and can recover the existing $mathcalO(T3/4)$ static regret. We propose a projection-free method to attain an $tildemathcalO(tau3/4)$ adaptive regret bound for any interval with length $tau.
arXiv Detail & Related papers (2023-05-19T15:02:10Z)
Doubly Optimal No-Regret Learning in Monotone Games [10.760195409078591]
We propose the first doubly optimal no-regret learning algorithm for smooth monotone games. Our algorithm achieves both (i) the optimal $O(sqrtT)$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(frac1T)$ last-iterate convergence rate to a Nash equilibrium.
arXiv Detail & Related papers (2023-01-30T17:55:53Z)
Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets. Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z)
Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Near-Optimal No-Regret Learning in General Games [31.293412884145066]
We show that Optimistic Hedge attains $rm poly(log T)$ regret in multi-player general-sum games. A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $tildeOleft(frac 1Tright)$.
arXiv Detail & Related papers (2021-08-16T06:53:02Z)
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon [88.75843804630772]
Episodic reinforcement learning generalizes contextual bandits. Long planning horizon and unknown state-dependent transitions pose little additional difficulty on sample complexity. We show MVP enjoys an $left(left(sqrtSAK + S2Aright)$ regret, approaching the $Omegaleft(sqrtSAK + S2Aright)$ lower bound of emphcontextual bandits up to logarithmic terms.
arXiv Detail & Related papers (2020-09-28T17:52:32Z)
Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.