On the Complexity of Computing Markov Perfect Equilibrium in General-Sum
Stochastic Games
- URL: http://arxiv.org/abs/2109.01795v1
- Date: Sat, 4 Sep 2021 05:47:59 GMT
- Title: On the Complexity of Computing Markov Perfect Equilibrium in General-Sum
Stochastic Games
- Authors: Xiaotie Deng, Yuhao Li, David Henry Mguni, Jun Wang, Yaodong Yang
- Abstract summary: Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions.
We derive an approximate Perfect Equilibrium (MPE) in a finite-state SGs Game within the exponential precision of textbfPPAD-complete.
Our results indicate that finding an MPE in SGs is highly unlikely to be textbfNP-hard unless textbfNP=textbfco-NP.
- Score: 18.48133964089095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Similar to the role of Markov decision processes in reinforcement learning,
Stochastic Games (SGs) lay the foundation for the study of multi-agent
reinforcement learning (MARL) and sequential agent interactions. In this paper,
we derive that computing an approximate Markov Perfect Equilibrium (MPE) in a
finite-state discounted Stochastic Game within the exponential precision is
\textbf{PPAD}-complete. We adopt a function with a polynomially bounded
description in the strategy space to convert the MPE computation to a
fixed-point problem, even though the stochastic game may demand an exponential
number of pure strategies, in the number of states, for each agent. The
completeness result follows the reduction of the fixed-point problem to {\sc
End of the Line}. Our results indicate that finding an MPE in SGs is highly
unlikely to be \textbf{NP}-hard unless \textbf{NP}=\textbf{co-NP}. Our work
offers confidence for MARL research to study MPE computation on general-sum SGs
and to develop fruitful algorithms as currently on zero-sum SGs.
Related papers
- Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - The Complexity of Markov Equilibrium in Stochastic Games [44.77547027158141]
We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum games is computationally intractable.
Our results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable.
arXiv Detail & Related papers (2022-04-08T10:51:01Z) - Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games [3.441021278275805]
Learning in games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL)
We establish the finite-sample complexity of fully decentralized Q-learning algorithms in a significant class of general approximation games (SGs)
We focus on the practical while challenging setting of fully decentralized MARL, where neither the rewards nor the actions of other agents can be observed by each agent.
arXiv Detail & Related papers (2021-12-15T03:33:39Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Gradient play in stochastic games: stationary points, convergence, and
sample complexity [6.97785632069611]
We study the performance of the gradient play algorithm for games (SGs)
We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting.
For a subclass of SGs called Markov potential games, we design a sample-based reinforcement learning algorithm.
arXiv Detail & Related papers (2021-06-01T03:03:45Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs)
We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.