Learning in Multi-Player Stochastic Games
- URL: http://arxiv.org/abs/2210.14280v1
- Date: Tue, 25 Oct 2022 19:02:03 GMT
- Title: Learning in Multi-Player Stochastic Games
- Authors: William Brown
- Abstract summary: We consider the problem of simultaneous learning in games with many players in the finite-horizon setting.
While the typical target solution for a game is a Nash equilibrium, this is intractable with many players.
We turn to a different target: algorithms which generate an equilibrium when they are used by all players.
- Score: 1.0878040851638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of simultaneous learning in stochastic games with
many players in the finite-horizon setting. While the typical target solution
for a stochastic game is a Nash equilibrium, this is intractable with many
players. We instead focus on variants of {\it correlated equilibria}, such as
those studied for extensive-form games. We begin with a hardness result for the
adversarial MDP problem: even for a horizon of 3, obtaining sublinear regret
against the best non-stationary policy is \textsf{NP}-hard when both rewards
and transitions are adversarial. This implies that convergence to even the
weakest natural solution concept -- normal-form coarse correlated equilbrium --
is not possible via black-box reduction to a no-regret algorithm even in
stochastic games with constant horizon (unless
$\textsf{NP}\subseteq\textsf{BPP}$). Instead, we turn to a different target:
algorithms which {\it generate} an equilibrium when they are used by all
players. Our main result is algorithm which generates an {\it extensive-form}
correlated equilibrium, whose runtime is exponential in the horizon but
polynomial in all other parameters. We give a similar algorithm which is
polynomial in all parameters for "fast-mixing" stochastic games. We also show a
method for efficiently reaching normal-form coarse correlated equilibria in
"single-controller" stochastic games which follows the traditional no-regret
approach. When shared randomness is available, the two generative algorithms
can be extended to give simultaneous regret bounds and converge in the
traditional sense.
Related papers
- Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games.
We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix.
We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Scalable and Independent Learning of Nash Equilibrium Policies in
$n$-Player Stochastic Games with Unknown Independent Chains [1.0878040851638]
We study games with independent chains and unknown transition matrices.
In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players.
We propose a fully decentralized mirror descent algorithm to learn an $epsilon$-NE policy.
arXiv Detail & Related papers (2023-12-04T03:04:09Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Global Nash Equilibrium in Non-convex Multi-player Game: Theory and
Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game.
We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z) - Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria.
We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z) - An algorithmic solution to the Blotto game using multi-marginal
couplings [27.35514815248438]
We describe an efficient algorithm to compute solutions for the general two-player Blotto game on n battlefields with heterogeneous values.
Our algorithm samples from an eps-optimal solution in time O(n2 + eps-4) independently of budgets and battlefield values.
In the case of asymmetric values where optimal solutions need not exist but Nash equilibria do, our algorithm samples from an eps-Nash equilibrium with similar complexity.
arXiv Detail & Related papers (2022-02-15T11:07:31Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.