Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model
- URL: http://arxiv.org/abs/2208.10458v1
- Date: Mon, 22 Aug 2022 17:24:55 GMT
- Title: Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model
- Authors: Gen Li, Yuejie Chi, Yuting Wei, Yuxin Chen
- Abstract summary: Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
- Score: 50.38446482252857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper is concerned with two-player zero-sum Markov games -- arguably the
most basic setting in multi-agent reinforcement learning -- with the goal of
learning a Nash equilibrium (NE) sample-optimally. All prior results suffer
from at least one of the two obstacles: the curse of multiple agents and the
barrier of long horizon, regardless of the sampling protocol in use. We take a
step towards settling this problem, assuming access to a flexible sampling
mechanism: the generative model. Focusing on non-stationary finite-horizon
Markov games, we develop a learning algorithm
$\mathsf{Nash}\text{-}\mathsf{Q}\text{-}\mathsf{FTRL}$ and an adaptive sampling
scheme that leverage the optimism principle in adversarial learning
(particularly the Follow-the-Regularized-Leader (FTRL) method), with a delicate
design of bonus terms that ensure certain decomposability under the FTRL
dynamics. Our algorithm learns an $\varepsilon$-approximate Markov NE policy
using
$$ \widetilde{O}\bigg( \frac{H^4 S(A+B)}{\varepsilon^2} \bigg) $$ samples,
where $S$ is the number of states, $H$ is the horizon, and $A$ (resp.~$B$)
denotes the number of actions for the max-player (resp.~min-player). This is
nearly un-improvable in a minimax sense. Along the way, we derive a refined
regret bound for FTRL that makes explicit the role of variance-type quantities,
which might be of independent interest.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.