Independent and Decentralized Learning in Markov Potential Games
- URL: http://arxiv.org/abs/2205.14590v6
- Date: Fri, 10 Nov 2023 07:03:57 GMT
- Title: Independent and Decentralized Learning in Markov Potential Games
- Authors: Chinmay Maheshwari and Manxi Wu and Druv Pai and Shankar Sastry
- Abstract summary: We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate.
In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward.
We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1.
- Score: 3.8779763612314633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a multi-agent reinforcement learning dynamics, and analyze its
convergence in infinite-horizon discounted Markov potential games. We focus on
the independent and decentralized setting, where players do not have knowledge
of the game model and cannot coordinate. In each stage, players update their
estimate of Q-function that evaluates their total contingent payoff based on
the realized one-stage reward in an asynchronous manner. Then, players
independently update their policies by incorporating an optimal one-stage
deviation strategy based on the estimated Q-function. A key feature of the
learning dynamics is that the Q-function estimates are updated at a faster
timescale than the policies. We prove that the policies induced by our learning
dynamics converge to the set of stationary Nash equilibria in Markov potential
games with probability 1. Our results highlight the efficacy of simple learning
dynamics in reaching to the set of stationary Nash equilibrium even in
environments with minimal information available.
Related papers
- Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games [3.8779763612314633]
We study the properties of learning algorithms in general-sum Markov games.
In particular, we focus on a decentralized algorithm where each agent adopts an actor-critic learning dynamic.
arXiv Detail & Related papers (2024-09-06T20:49:11Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - Asymptotic Convergence and Performance of Multi-Agent Q-Learning
Dynamics [38.5932141555258]
We study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm.
We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game.
arXiv Detail & Related papers (2023-01-23T18:39:11Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Decentralized Q-Learning in Zero-sum Markov Games [33.81574774144886]
We study multi-agent reinforcement learning (MARL) in discounted zero-sum Markov games.
We develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent.
The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agent's perspective.
arXiv Detail & Related papers (2021-06-04T22:42:56Z) - Independent Policy Gradient Methods for Competitive Reinforcement
Learning [62.91197073795261]
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
arXiv Detail & Related papers (2021-01-11T23:20:42Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.