Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic
Games with Independent Chains
- URL: http://arxiv.org/abs/2201.12224v4
- Date: Wed, 22 Mar 2023 02:33:47 GMT
- Title: Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic
Games with Independent Chains
- Authors: S. Rasoul Etesami
- Abstract summary: We consider a class of $n$-player games in which players have their own internal state/action spaces while they are coupled through their payoff functions.
For this class of games, we first show that finding a stationary Nash (NE) policy without any assumption on the reward functions is interactable.
We develop algorithms based on dual averaging and dual mirror descent, which converge to the set of $epsilon$-NE policies.
- Score: 2.132096006921048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a subclass of $n$-player stochastic games, in which players have
their own internal state/action spaces while they are coupled through their
payoff functions. It is assumed that players' internal chains are driven by
independent transition probabilities. Moreover, players can receive only
realizations of their payoffs, not the actual functions, and cannot observe
each other's states/actions. For this class of games, we first show that
finding a stationary Nash equilibrium (NE) policy without any assumption on the
reward functions is interactable. However, for general reward functions, we
develop polynomial-time learning algorithms based on dual averaging and dual
mirror descent, which converge in terms of the averaged Nikaido-Isoda distance
to the set of $\epsilon$-NE policies almost surely or in expectation. In
particular, under extra assumptions on the reward functions such as social
concavity, we derive polynomial upper bounds on the number of iterates to
achieve an $\epsilon$-NE policy with high probability. Finally, we evaluate the
effectiveness of the proposed algorithms in learning $\epsilon$-NE policies
using numerical experiments for energy management in smart grids.
Related papers
- On Tractable $Φ$-Equilibria in Non-Concave Games [53.212133025684224]
Non-concave games introduce significant game-theoretic and optimization challenges.
We show that when $Phi$ is finite, there exists an efficient uncoupled learning algorithm that converges to the corresponding $Phi$-equilibria.
We also show that Online Gradient Descent can efficiently approximate $Phi$-equilibria in non-trivial regimes.
arXiv Detail & Related papers (2024-03-13T01:51:30Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Scalable and Independent Learning of Nash Equilibrium Policies in
$n$-Player Stochastic Games with Unknown Independent Chains [1.0878040851638]
We study games with independent chains and unknown transition matrices.
In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players.
We propose a fully decentralized mirror descent algorithm to learn an $epsilon$-NE policy.
arXiv Detail & Related papers (2023-12-04T03:04:09Z) - Global Nash Equilibrium in Non-convex Multi-player Game: Theory and
Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game.
We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z) - Policy Mirror Ascent for Efficient and Independent Learning in Mean
Field Games [35.86199604587823]
Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games.
We show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $widetildemathcalO(varepsilon-2)$ samples.
arXiv Detail & Related papers (2022-12-29T20:25:18Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria.
We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Approximating Discontinuous Nash Equilibrial Values of Two-Player
General-Sum Differential Games [21.291449080239673]
This paper extends from previous SOTA on zero-sum games with continuous values to general-sum games with discontinuous values, where the discontinuity is caused by that of the players' losses.
We show that due to its lack of convergence proof and generalization analysis on discontinuous losses, the existing self-supervised learning technique fails to generalize and raises safety concerns in an autonomous driving application.
Our solution is to first pre-train the value network on supervised Nash equilibria, and then refine it by minimizing a loss that combines the supervised data with the PDE and boundary conditions.
arXiv Detail & Related papers (2022-07-05T02:22:05Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - On Reinforcement Learning for Turn-based Zero-sum Markov Games [24.071036229246644]
We consider the problem of finding Nash equilibrium for two-player turn-based zero-sum games.
Inspired by the AlphaGo Zero (AGZ) algorithm, we develop a Reinforcement Learning based approach.
arXiv Detail & Related papers (2020-02-25T01:40:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.