Independent Natural Policy Gradient Always Converges in Markov Potential
Games
- URL: http://arxiv.org/abs/2110.10614v1
- Date: Wed, 20 Oct 2021 15:15:10 GMT
- Title: Independent Natural Policy Gradient Always Converges in Markov Potential
Games
- Authors: Roy Fox, Stephen McAleer, Will Overman, Ioannis Panageas
- Abstract summary: We study mixed cooperative/competitive games called Markov Potential Games (MPGs)
We prove that Independent Natural Policy Gradient always converges in MPGs using constant learning rates.
We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games.
- Score: 18.43622733760659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning has been successfully applied to
fully-cooperative and fully-competitive environments, but little is currently
known about mixed cooperative/competitive environments. In this paper, we focus
on a particular class of multi-agent mixed cooperative/competitive stochastic
games called Markov Potential Games (MPGs), which include cooperative games as
a special case. Recent results have shown that independent policy gradient
converges in MPGs but it was not known whether Independent Natural Policy
Gradient converges in MPGs as well. We prove that Independent Natural Policy
Gradient always converges in the last iterate using constant learning rates.
The proof deviates from the existing approaches and the main challenge lies in
the fact that Markov Potential Games do not have unique optimal values (as
single-agent settings exhibit) so different initializations can lead to
different limit point values. We complement our theoretical results with
experiments that indicate that Natural Policy Gradient outperforms Policy
Gradient in routing games and congestion games.
Related papers
- Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization [12.612009339150504]
This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning.
We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE)
arXiv Detail & Related papers (2024-05-04T22:48:53Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Provably Learning Nash Policies in Constrained Markov Potential Games [90.87573337770293]
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents.
Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable.
arXiv Detail & Related papers (2023-06-13T13:08:31Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Independent Policy Gradient for Large-Scale Markov Potential Games:
Sharper Rates, Function Approximation, and Game-Agnostic Convergence [30.084357461497042]
We learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large.
We propose new independent policy gradient algorithms that are run by all players in tandem.
We identify a class of independent policy gradient algorithms that enjoys convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played.
arXiv Detail & Related papers (2022-02-08T20:09:47Z) - Global Convergence of Multi-Agent Policy Gradient in Markov Potential
Games [33.36015509903024]
We present a novel definition of Markov Potential Games (MPG)
MPG generalizes prior attempts at capturing complex stateful multi-agent coordination.
We show that MPGs showcase standard desirable properties such as the existence of deterministic Nash policies.
arXiv Detail & Related papers (2021-06-03T16:17:46Z) - Independent Policy Gradient Methods for Competitive Reinforcement
Learning [62.91197073795261]
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
arXiv Detail & Related papers (2021-01-11T23:20:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.