Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games
- URL: http://arxiv.org/abs/2112.07859v2
- Date: Thu, 16 Dec 2021 18:14:38 GMT
- Title: Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games
- Authors: Zuguang Gao, Qianqian Ma, Tamer Ba\c{s}ar, John R. Birge
- Abstract summary: Learning in games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL)
We establish the finite-sample complexity of fully decentralized Q-learning algorithms in a significant class of general approximation games (SGs)
We focus on the practical while challenging setting of fully decentralized MARL, where neither the rewards nor the actions of other agents can be observed by each agent.
- Score: 3.441021278275805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning in stochastic games is arguably the most standard and fundamental
setting in multi-agent reinforcement learning (MARL). In this paper, we
consider decentralized MARL in stochastic games in the non-asymptotic regime.
In particular, we establish the finite-sample complexity of fully decentralized
Q-learning algorithms in a significant class of general-sum stochastic games
(SGs) - weakly acyclic SGs, which includes the common cooperative MARL setting
with an identical reward to all agents (a Markov team problem) as a special
case. We focus on the practical while challenging setting of fully
decentralized MARL, where neither the rewards nor the actions of other agents
can be observed by each agent. In fact, each agent is completely oblivious to
the presence of other decision makers. Both the tabular and the linear function
approximation cases have been considered. In the tabular setting, we analyze
the sample complexity for the decentralized Q-learning algorithm to converge to
a Markov perfect equilibrium (Nash equilibrium). With linear function
approximation, the results are for convergence to a linear approximated
equilibrium - a new notion of equilibrium that we propose - which describes
that each agent's policy is a best reply (to other agents) within a linear
space. Numerical experiments are also provided for both settings to demonstrate
the results.
Related papers
- Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential
Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents.
This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents.
Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - The Complexity of Markov Equilibrium in Stochastic Games [44.77547027158141]
We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum games is computationally intractable.
Our results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable.
arXiv Detail & Related papers (2022-04-08T10:51:01Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Optimal Algorithms for Decentralized Stochastic Variational Inequalities [113.43047601775453]
This work concentrates on the decentralized setting, which is increasingly important but not well understood.
We present lower bounds for both communication and local iterations and construct optimal algorithms that match these lower bounds.
Our algorithms are the best among the available not only in the decentralized case, but also in the deterministic and non-distributed literature.
arXiv Detail & Related papers (2022-02-06T13:14:02Z) - On the Complexity of Computing Markov Perfect Equilibrium in General-Sum
Stochastic Games [18.48133964089095]
Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions.
We derive an approximate Perfect Equilibrium (MPE) in a finite-state SGs Game within the exponential precision of textbfPPAD-complete.
Our results indicate that finding an MPE in SGs is highly unlikely to be textbfNP-hard unless textbfNP=textbfco-NP.
arXiv Detail & Related papers (2021-09-04T05:47:59Z) - Gradient play in stochastic games: stationary points, convergence, and
sample complexity [6.97785632069611]
We study the performance of the gradient play algorithm for games (SGs)
We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting.
For a subclass of SGs called Markov potential games, we design a sample-based reinforcement learning algorithm.
arXiv Detail & Related papers (2021-06-01T03:03:45Z) - Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples.
This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z) - Calibration of Shared Equilibria in General Sum Partially Observable
Markov Games [15.572157454411533]
We consider a general sum partially observable Markov game where agents of different types share a single policy network.
This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets.
arXiv Detail & Related papers (2020-06-23T15:14:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.