Scalable and Independent Learning of Nash Equilibrium Policies in
$n$-Player Stochastic Games with Unknown Independent Chains
- URL: http://arxiv.org/abs/2312.01587v1
- Date: Mon, 4 Dec 2023 03:04:09 GMT
- Title: Scalable and Independent Learning of Nash Equilibrium Policies in
$n$-Player Stochastic Games with Unknown Independent Chains
- Authors: Tiancheng Qin and S. Rasoul Etesami
- Abstract summary: We study games with independent chains and unknown transition matrices.
In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players.
We propose a fully decentralized mirror descent algorithm to learn an $epsilon$-NE policy.
- Score: 1.0878040851638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a subclass of $n$-player stochastic games, namely, stochastic games
with independent chains and unknown transition matrices. In this class of
games, players control their own internal Markov chains whose transitions do
not depend on the states/actions of other players. However, players' decisions
are coupled through their payoff functions. We assume players can receive only
realizations of their payoffs, and that the players can not observe the states
and actions of other players, nor do they know the transition probability
matrices of their own Markov chain. Relying on a compact dual formulation of
the game based on occupancy measures and the technique of confidence set to
maintain high-probability estimates of the unknown transition matrices, we
propose a fully decentralized mirror descent algorithm to learn an
$\epsilon$-NE for this class of games. The proposed algorithm has the desired
properties of independence, scalability, and convergence. Specifically, under
no assumptions on the reward functions, we show the proposed algorithm
converges in polynomial time in a weaker distance (namely, the averaged
Nikaido-Isoda gap) to the set of $\epsilon$-NE policies with arbitrarily high
probability. Moreover, assuming the existence of a variationally stable Nash
equilibrium policy, we show that the proposed algorithm converges
asymptotically to the stable $\epsilon$-NE policy with arbitrarily high
probability. In addition to Markov potential games and linear-quadratic
stochastic games, this work provides another subclass of $n$-player stochastic
games that, under some mild assumptions, admit polynomial-time learning
algorithms for finding their stationary $\epsilon$-NE policies.
Related papers
- Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games [31.554420227087043]
We develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players.
In the matrix game setting, the results imply a complexity of $O(epsilon-1)$ to find the Nash distribution.
In the game setting, the results also imply a complexity of $O(epsilon-8)$ to find a Nash equilibrium.
arXiv Detail & Related papers (2024-09-02T20:07:25Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria.
We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Independent Policy Gradient for Large-Scale Markov Potential Games:
Sharper Rates, Function Approximation, and Game-Agnostic Convergence [30.084357461497042]
We learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large.
We propose new independent policy gradient algorithms that are run by all players in tandem.
We identify a class of independent policy gradient algorithms that enjoys convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played.
arXiv Detail & Related papers (2022-02-08T20:09:47Z) - Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic
Games with Independent Chains [2.132096006921048]
We consider a class of $n$-player games in which players have their own internal state/action spaces while they are coupled through their payoff functions.
For this class of games, we first show that finding a stationary Nash (NE) policy without any assumption on the reward functions is interactable.
We develop algorithms based on dual averaging and dual mirror descent, which converge to the set of $epsilon$-NE policies.
arXiv Detail & Related papers (2022-01-28T16:27:21Z) - An Independent Learning Algorithm for a Class of Symmetric Stochastic
Games [0.0]
This paper investigates the feasibility of using independent learners to find approximate equilibrium policies in non-episodic, discounted games.
We present an independent learning algorithm that comes with high probability guarantees of approximate equilibrium in this class of games.
arXiv Detail & Related papers (2021-10-09T19:57:21Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.