Near-Optimal Learning of Extensive-Form Games with Imperfect Information
- URL: http://arxiv.org/abs/2202.01752v3
- Date: Mon, 3 Apr 2023 17:34:25 GMT
- Title: Near-Optimal Learning of Extensive-Form Games with Imperfect Information
- Authors: Yu Bai, Chi Jin, Song Mei, Tiancheng Yu
- Abstract summary: We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
- Score: 54.55092907312749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper resolves the open question of designing near-optimal algorithms
for learning imperfect-information extensive-form games from bandit feedback.
We present the first line of algorithms that require only
$\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an
$\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where
$X,Y$ are the number of information sets and $A,B$ are the number of actions
for the two players. This improves upon the best known sample complexity of
$\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of
$\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic
lower bound up to logarithmic factors. We achieve this sample complexity by two
new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual
Regret Minimization. Both algorithms rely on novel approaches of integrating
\emph{balanced exploration policies} into their classical counterparts. We also
extend our results to learning Coarse Correlated Equilibria in multi-player
general-sum games.
Related papers
- Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games.
We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix.
We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z) - The complexity of approximate (coarse) correlated equilibrium for incomplete information games [16.96984593866157]
We study the complexity of decentralized learning of approximate correlated equilibria in incomplete information games.
Our lower bound holds even for the easier solution concept of $epsilon$-approximate $mathitco$ correlated equilibrium.
arXiv Detail & Related papers (2024-06-04T14:35:27Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Adapting to game trees in zero-sum imperfect information games [41.4836057085788]
Imperfect information games (IIG) are games in which each player only partially observes the current game state.
We study how to learn $epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback.
arXiv Detail & Related papers (2022-12-23T19:45:25Z) - Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games [18.832436856339587]
This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data.
We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game.
arXiv Detail & Related papers (2022-06-08T17:58:06Z) - When Can We Learn General-Sum Markov Games with a Large Number of
Players Sample-Efficiently? [10.397170312149067]
This paper investigates what learning goals admit better sample complexities in the setting of $m$-player general-sum Markov games.
First, we design algorithms for learning an $epsilon$-Coarse Correlated Equilibrium (CCE) in $widetildemathcalO(H5Smax_ile m A_i / epsilon2)$ episodes.
Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an $epsilon$-approximate Nash equilibrium within $widet
arXiv Detail & Related papers (2021-10-08T15:06:22Z) - Almost Optimal Algorithms for Two-player Markov Games with Linear
Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves.
We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty"
We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z) - Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision.
We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.