Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games
- URL: http://arxiv.org/abs/2303.12287v1
- Date: Wed, 22 Mar 2023 03:28:12 GMT
- Title: Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games
- Authors: Dylan J. Foster, Noah Golowich, Sham M. Kakade
- Abstract summary: We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
- Score: 70.19141208203227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of decentralized multi-agent reinforcement learning
in Markov games. A fundamental question is whether there exist algorithms that,
when adopted by all agents and run independently in a decentralized fashion,
lead to no-regret for each player, analogous to celebrated convergence results
in normal-form games. While recent work has shown that such algorithms exist
for restricted settings (notably, when regret is defined with respect to
deviations to Markovian policies), the question of whether independent
no-regret learning can be achieved in the standard Markov game framework was
open. We provide a decisive negative resolution this problem, both from a
computational and statistical perspective. We show that:
- Under the widely-believed assumption that PPAD-hard problems cannot be
solved in polynomial time, there is no polynomial-time algorithm that attains
no-regret in general-sum Markov games when executed independently by all
players, even when the game is known to the algorithm designer and the number
of players is a small constant.
- When the game is unknown, no algorithm, regardless of computational
efficiency, can achieve no-regret without observing a number of episodes that
is exponential in the number of players.
Perhaps surprisingly, our lower bounds hold even for seemingly easier setting
in which all agents are controlled by a a centralized algorithm. They are
proven via lower bounds for a simpler problem we refer to as SparseCCE, in
which the goal is to compute a coarse correlated equilibrium that is sparse in
the sense that it can be represented as a mixture of a small number of product
policies. The crux of our approach is a novel application of aggregation
techniques from online learning, whereby we show that any algorithm for the
SparseCCE problem can be used to compute approximate Nash equilibria for
non-zero sum normal-form games.
Related papers
- Independent Learning in Constrained Markov Potential Games [19.083595175045073]
Constrained Markov games offer a formal framework for modeling multi-agent reinforcement learning problems.
We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria.
arXiv Detail & Related papers (2024-02-27T20:57:35Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - Learning in Multi-Player Stochastic Games [1.0878040851638]
We consider the problem of simultaneous learning in games with many players in the finite-horizon setting.
While the typical target solution for a game is a Nash equilibrium, this is intractable with many players.
We turn to a different target: algorithms which generate an equilibrium when they are used by all players.
arXiv Detail & Related papers (2022-10-25T19:02:03Z) - Regret Minimization and Convergence to Equilibria in General-sum Markov
Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.
Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z) - Learning in Congestion Games with Bandit Feedback [45.4542525044623]
We investigate congestion games, a class of games with benign theoretical structure and broad real-world applications.
We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games.
We then propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design.
arXiv Detail & Related papers (2022-06-04T02:32:26Z) - Independent Policy Gradient for Large-Scale Markov Potential Games:
Sharper Rates, Function Approximation, and Game-Agnostic Convergence [30.084357461497042]
We learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large.
We propose new independent policy gradient algorithms that are run by all players in tandem.
We identify a class of independent policy gradient algorithms that enjoys convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played.
arXiv Detail & Related papers (2022-02-08T20:09:47Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.