Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum
Markov Games
- URL: http://arxiv.org/abs/2102.08903v1
- Date: Wed, 17 Feb 2021 17:49:57 GMT
- Title: Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum
Markov Games
- Authors: Yulai Zhao, Yuandong Tian, Jason D. Lee, Simon S. Du
- Abstract summary: This paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games.
We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error.
- Score: 95.70078702838654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policy gradient methods are widely used in solving two-player zero-sum games
to achieve superhuman performance in practice. However, it remains elusive when
they can provably find a near-optimal solution and how many samples and
iterations are needed. The current paper studies natural extensions of Natural
Policy Gradient algorithm for solving two-player zero-sum games where function
approximation is used for generalization across states. We thoroughly
characterize the algorithms' performance in terms of the number of samples,
number of iterations, concentrability coefficients, and approximation error. To
our knowledge, this is the first quantitative analysis of policy gradient
methods with function approximation for two-player zero-sum Markov games.
Related papers
- Breaking the Curse of Multiagency: Provably Efficient Decentralized
Multi-Agent RL with Function Approximation [44.051717720483595]
This paper presents the first line of MARL algorithms that provably resolve the curse of multiagency approximation.
In exchange for learning a weaker version of CCEs, this algorithm applies to a wider range of problems under generic function approximation.
Our algorithm always outputs Markov CCEs, and an optimal rate of $widetildemathcalO(epsilon-2)$ for finding $epsilon$-optimal solutions.
arXiv Detail & Related papers (2023-02-13T18:59:25Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - HSVI can solve zero-sum Partially Observable Stochastic Games [7.293053431456775]
State-of-the-art methods for solving 2-player zero-sum imperfect games rely on linear programming or dynamic regret minimization.
We propose a novel family of promising approaches complementing those relying on linear programming or iterative methods.
arXiv Detail & Related papers (2022-10-26T11:41:57Z) - Reinforcement Learning with Unbiased Policy Evaluation and Linear
Function Approximation [11.345796608258434]
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes.
We analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via at least squares at each iteration.
The second algorithm involves a two-time-scale approximation algorithm taking several steps of gradient descent towards the least squares solution.
arXiv Detail & Related papers (2022-10-13T20:16:19Z) - Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation.
We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games.
The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z) - Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games [1.1852406625172216]
zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied.
Two policy optimization methods that rely on policy gradient are proposed.
arXiv Detail & Related papers (2020-09-02T13:49:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.