Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model
- URL: http://arxiv.org/abs/2208.10458v1
- Date: Mon, 22 Aug 2022 17:24:55 GMT
- Title: Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model
- Authors: Gen Li, Yuejie Chi, Yuting Wei, Yuxin Chen
- Abstract summary: Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
- Score: 50.38446482252857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper is concerned with two-player zero-sum Markov games -- arguably the
most basic setting in multi-agent reinforcement learning -- with the goal of
learning a Nash equilibrium (NE) sample-optimally. All prior results suffer
from at least one of the two obstacles: the curse of multiple agents and the
barrier of long horizon, regardless of the sampling protocol in use. We take a
step towards settling this problem, assuming access to a flexible sampling
mechanism: the generative model. Focusing on non-stationary finite-horizon
Markov games, we develop a learning algorithm
$\mathsf{Nash}\text{-}\mathsf{Q}\text{-}\mathsf{FTRL}$ and an adaptive sampling
scheme that leverage the optimism principle in adversarial learning
(particularly the Follow-the-Regularized-Leader (FTRL) method), with a delicate
design of bonus terms that ensure certain decomposability under the FTRL
dynamics. Our algorithm learns an $\varepsilon$-approximate Markov NE policy
using
$$ \widetilde{O}\bigg( \frac{H^4 S(A+B)}{\varepsilon^2} \bigg) $$ samples,
where $S$ is the number of states, $H$ is the horizon, and $A$ (resp.~$B$)
denotes the number of actions for the max-player (resp.~min-player). This is
nearly un-improvable in a minimax sense. Along the way, we derive a refined
regret bound for FTRL that makes explicit the role of variance-type quantities,
which might be of independent interest.
Related papers
- Provably Efficient Generalized Lagrangian Policy Optimization for Safe
Multi-Agent Reinforcement Learning [105.7510838453122]
We examine online safe multi-agent reinforcement learning using constrained Markov games.
We develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem.
Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step.
arXiv Detail & Related papers (2023-05-31T22:09:24Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games [18.832436856339587]
This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data.
We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game.
arXiv Detail & Related papers (2022-06-08T17:58:06Z) - Provably Efficient Reinforcement Learning in Decentralized General-Sum
Markov Games [5.205867750232226]
This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games.
We propose an algorithm in which each agent independently runs optimistic V-learning to efficiently explore the unknown environment.
We show that the agents can find an $epsilon$-approximate CCE in at most $widetildeO( H6S A /epsilon2)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:01:22Z) - A Sharp Analysis of Model-based Reinforcement Learning with Self-Play [49.88233710867315]
We present a sharp analysis of model-based self-play algorithms for multi-agent Markov games.
We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games.
We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games.
arXiv Detail & Related papers (2020-10-04T15:27:39Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.