Adapting to game trees in zero-sum imperfect information games
- URL: http://arxiv.org/abs/2212.12567v1
- Date: Fri, 23 Dec 2022 19:45:25 GMT
- Title: Adapting to game trees in zero-sum imperfect information games
- Authors: C\^ome Fiegel, Pierre M\'enard, Tadashi Kozuno, R\'emi Munos, Vianney
Perchet, Michal Valko
- Abstract summary: Imperfect information games (IIG) are games in which each player only partially observes the current game state.
We study how to learn $epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback.
- Score: 41.4836057085788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imperfect information games (IIG) are games in which each player only
partially observes the current game state. We study how to learn
$\epsilon$-optimal strategies in a zero-sum IIG through self-play with
trajectory feedback. We give a problem-independent lower bound
$\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required
number of realizations to learn these strategies with high probability, where
$H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the
total number of actions for the two players. We also propose two Follow the
Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which
matches this lower bound, but requires the knowledge of the information set
structure beforehand to define the regularization; and Adaptive-FTRL which
needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays
without this requirement by progressively adapting the regularization to the
observations.
Related papers
- Corrupted Learning Dynamics in Games [62.73758165845971]
An equilibrium can be computed at a fast rate of $O(log T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL)
We present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm.
arXiv Detail & Related papers (2024-12-10T02:23:44Z) - The complexity of approximate (coarse) correlated equilibrium for incomplete information games [16.96984593866157]
We study the complexity of decentralized learning of approximate correlated equilibria in incomplete information games.
Our lower bound holds even for the easier solution concept of $epsilon$-approximate $mathitco$ correlated equilibrium.
arXiv Detail & Related papers (2024-06-04T14:35:27Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games [18.832436856339587]
This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data.
We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game.
arXiv Detail & Related papers (2022-06-08T17:58:06Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - When Can We Learn General-Sum Markov Games with a Large Number of
Players Sample-Efficiently? [10.397170312149067]
This paper investigates what learning goals admit better sample complexities in the setting of $m$-player general-sum Markov games.
First, we design algorithms for learning an $epsilon$-Coarse Correlated Equilibrium (CCE) in $widetildemathcalO(H5Smax_ile m A_i / epsilon2)$ episodes.
Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an $epsilon$-approximate Nash equilibrium within $widet
arXiv Detail & Related papers (2021-10-08T15:06:22Z) - Almost Optimal Algorithms for Two-player Markov Games with Linear
Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves.
We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty"
We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z) - Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision.
We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.