Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games
- URL: http://arxiv.org/abs/2208.09747v3
- Date: Tue, 19 Sep 2023 13:42:26 GMT
- Title: Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games
- Authors: Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm
- Abstract summary: We establish efficient and uncoupled learning dynamics so that the trigger regret of each player grows as $O(log T)$ after $T$ repetitions of play.
This improves exponentially over the prior best known trigger-regret bound of $O(T1/4)$.
- Score: 85.78272987312343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we establish efficient and uncoupled learning dynamics so
that, when employed by all players in multiplayer perfect-recall
imperfect-information extensive-form games, the trigger regret of each player
grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially
over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a
recent open question by Bai et al. (2022). As an immediate consequence, we
guarantee convergence to the set of extensive-form correlated equilibria and
coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$.
Building on prior work, at the heart of our construction lies a more general
result regarding fixed points deriving from rational functions with polynomial
degree, a property that we establish for the fixed points of (coarse) trigger
deviation functions. Moreover, our construction leverages a refined regret
circuit for the convex hull, which -- unlike prior guarantees -- preserves the
RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has
an independent interest in establishing near-optimal regret under learning
dynamics based on a CFR-type decomposition of the regret.
Related papers
- $\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games [8.215874655947335]
We show that an optimistic-follow-the-regularized-leader algorithm can find $widetildeO(T-1)$-approximate iterations in full-information general-sum Markov games within $T$.
arXiv Detail & Related papers (2024-02-02T20:40:27Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.