Towards General Function Approximation in Zero-Sum Markov Games
- URL: http://arxiv.org/abs/2107.14702v1
- Date: Fri, 30 Jul 2021 15:25:13 GMT
- Title: Towards General Function Approximation in Zero-Sum Markov Games
- Authors: Baihe Huang and Jason D. Lee and Zhaoran Wang and Zhuoran Yang
- Abstract summary: This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves.
Provably efficient algorithms for both decoupled and coordinated settings are developed.
- Score: 126.58493169301012
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper considers two-player zero-sum finite-horizon Markov games with
simultaneous moves. The study focuses on the challenging settings where the
value function or the model is parameterized by general function classes.
Provably efficient algorithms for both decoupled and {coordinated} settings are
developed. In the {decoupled} setting where the agent controls a single player
and plays against an arbitrary opponent, we propose a new model-free algorithm.
The sample complexity is governed by the Minimax Eluder dimension -- a new
dimension of the function class in Markov games. As a special case, this method
improves the state-of-the-art algorithm by a $\sqrt{d}$ factor in the regret
when the reward function and transition kernel are parameterized with
$d$-dimensional linear features. In the {coordinated} setting where both
players are controlled by the agent, we propose a model-based algorithm and a
model-free algorithm. In the model-based algorithm, we prove that sample
complexity can be bounded by a generalization of Witness rank to Markov games.
The model-free algorithm enjoys a $\sqrt{K}$-regret upper bound where $K$ is
the number of episodes. Our algorithms are based on new techniques of alternate
optimism.
Related papers
- Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - Almost Optimal Algorithms for Two-player Markov Games with Linear
Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves.
We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty"
We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z) - A Sharp Analysis of Model-based Reinforcement Learning with Self-Play [49.88233710867315]
We present a sharp analysis of model-based self-play algorithms for multi-agent Markov games.
We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games.
We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games.
arXiv Detail & Related papers (2020-10-04T15:27:39Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.