Efficient $\Phi$-Regret Minimization in Extensive-Form Games via Online
Mirror Descent
- URL: http://arxiv.org/abs/2205.15294v2
- Date: Thu, 2 Jun 2022 17:27:22 GMT
- Title: Efficient $\Phi$-Regret Minimization in Extensive-Form Games via Online
Mirror Descent
- Authors: Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu
- Abstract summary: $Phi$-Hedge is a generic algorithm capable of learning a large class of equilibria for Normal-Form Games (NFGs)
We show that $Phi$-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs.
We prove that, in those settings, the emph$Phi$-Hedge algorithms are equivalent to standard Mirror Descent (OMD) algorithms for
- Score: 49.93548413166884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A conceptually appealing approach for learning Extensive-Form Games (EFGs) is
to convert them to Normal-Form Games (NFGs). This approach enables us to
directly translate state-of-the-art techniques and analyses in NFGs to learning
EFGs, but typically suffers from computational intractability due to the
exponential blow-up of the game size introduced by the conversion. In this
paper, we address this problem in natural and important setups for the
\emph{$\Phi$-Hedge} algorithm -- A generic algorithm capable of learning a
large class of equilibria for NFGs. We show that $\Phi$-Hedge can be directly
used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse
Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE)
in EFGs. We prove that, in those settings, the \emph{$\Phi$-Hedge} algorithms
are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with
suitable dilated regularizers, and run in polynomial time. This new connection
further allows us to design and analyze a new class of OMD algorithms based on
modifying its log-partition function. In particular, we design an improved
algorithm with balancing techniques that achieves a sharp
$\widetilde{\mathcal{O}}(\sqrt{XAT})$ EFCE-regret under bandit-feedback in an
EFG with $X$ information sets, $A$ actions, and $T$ episodes. To our best
knowledge, this is the first such rate and matches the information-theoretic
lower bound.
Related papers
- Learning Regularized Graphon Mean-Field Games with Unknown Graphons [155.38727464526923]
We design reinforcement learning algorithms for Graphon Mean-Field Games (GMFGs)
We aim to learn the Nash Equilibrium (NE) of the regularized GMFGs when the graphons are unknown.
These algorithms are the first specifically designed for learning graphons from sampled agents.
arXiv Detail & Related papers (2023-10-26T16:19:24Z) - Learning Regularized Monotone Graphon Mean-Field Games [155.38727464526923]
We study fundamental problems in regularized Graphon Mean-Field Games (GMFGs)
We establish the existence of a Nash Equilibrium (NE) of any $lambda$-regularized GMFG.
We propose provably efficient algorithms to learn the NE in weakly monotone GMFGs.
arXiv Detail & Related papers (2023-10-12T07:34:13Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - The Power of Regularization in Solving Extensive-Form Games [22.557806157585834]
We propose a series of new algorithms based on regularizing the payoff functions of the game.
In particular, we first show that dilated optimistic mirror descent (DOMD) can achieve a fast $tilde O(T)$ last-iterate convergence.
We also show that Reg-CFR, with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(T1/4)$ best-iterate, and $O(T3/4)$ average-iterate convergence rate.
arXiv Detail & Related papers (2022-06-19T22:10:38Z) - Sample-Efficient Learning of Correlated Equilibria in Extensive-Form
Games [22.94768429216664]
Imperfect-Information Extensive-Form Games (IIEFGs) is a prevalent model for real-world games involving imperfect information and sequential plays.
The Extensive-Form Correlated Equilibrium (EFCE) has been proposed as a natural solution concept for multi-player general-sum IIEFGs.
This paper presents the first sample-efficient algorithm for learning the EFCE from bandit feedback.
arXiv Detail & Related papers (2022-05-15T08:55:28Z) - Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the
Gap Between Learning in Extensive-Form and Normal-Form Games [76.21916750766277]
We show that the Optimistic Multiplicative Weights Update (OMWU) algorithm can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick.
Specifically, KOMWU gives the first algorithm that guarantees at the same time last-iterate convergence.
arXiv Detail & Related papers (2022-02-01T06:28:51Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - An Operator Splitting View of Federated Learning [23.99238431431463]
In the past few years, the learning ($texttFL$) community has witnessed a proliferation of new $texttFL$ algorithms.
We compare different algorithms with ease, to previous convergence results and to uncover new algorithmic variants.
The unification algorithms also leads a way to accelerate $texttFL$ algorithms, without any overhead.
arXiv Detail & Related papers (2021-08-12T21:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.