Global Convergence of Multi-Agent Policy Gradient in Markov Potential
Games
- URL: http://arxiv.org/abs/2106.01969v1
- Date: Thu, 3 Jun 2021 16:17:46 GMT
- Title: Global Convergence of Multi-Agent Policy Gradient in Markov Potential
Games
- Authors: Stefanos Leonardos, Will Overman, Ioannis Panageas, Georgios Piliouras
- Abstract summary: We present a novel definition of Markov Potential Games (MPG)
MPG generalizes prior attempts at capturing complex stateful multi-agent coordination.
We show that MPGs showcase standard desirable properties such as the existence of deterministic Nash policies.
- Score: 33.36015509903024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Potential games are arguably one of the most important and widely studied
classes of normal form games. They define the archetypal setting of multi-agent
coordination as all agent utilities are perfectly aligned with each other via a
common potential function. Can this intuitive framework be transplanted in the
setting of Markov Games? What are the similarities and differences between
multi-agent coordination with and without state dependence? We present a novel
definition of Markov Potential Games (MPG) that generalizes prior attempts at
capturing complex stateful multi-agent coordination. Counter-intuitively,
insights from normal-form potential games do not carry over as MPGs can consist
of settings where state-games can be zero-sum games. In the opposite direction,
Markov games where every state-game is a potential game are not necessarily
MPGs. Nevertheless, MPGs showcase standard desirable properties such as the
existence of deterministic Nash policies. In our main technical result, we
prove fast convergence of independent policy gradient to Nash policies by
adapting recent gradient dominance property arguments developed for single
agent MDPs to multi-agent learning settings.
Related papers
- Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players [17.55330497310932]
Markov Potential Games (MPGs) form an important sub-class of Markov games.
MPGs include as a special case the identical-interest setting where all the agents share the same reward function.
Scaling the performance of Nash equilibrium learning algorithms to a large number of agents is crucial for multi-agent systems.
arXiv Detail & Related papers (2024-08-15T11:02:05Z) - Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization [12.612009339150504]
This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning.
We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE)
arXiv Detail & Related papers (2024-05-04T22:48:53Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Provably Learning Nash Policies in Constrained Markov Potential Games [90.87573337770293]
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents.
Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable.
arXiv Detail & Related papers (2023-06-13T13:08:31Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Independent Policy Gradient for Large-Scale Markov Potential Games:
Sharper Rates, Function Approximation, and Game-Agnostic Convergence [30.084357461497042]
We learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large.
We propose new independent policy gradient algorithms that are run by all players in tandem.
We identify a class of independent policy gradient algorithms that enjoys convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played.
arXiv Detail & Related papers (2022-02-08T20:09:47Z) - Independent Natural Policy Gradient Always Converges in Markov Potential
Games [18.43622733760659]
We study mixed cooperative/competitive games called Markov Potential Games (MPGs)
We prove that Independent Natural Policy Gradient always converges in MPGs using constant learning rates.
We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games.
arXiv Detail & Related papers (2021-10-20T15:15:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.