Related papers: Learning in Multi-Player Stochastic Games

Learning in Multi-Player Stochastic Games

URL: http://arxiv.org/abs/2210.14280v1
Date: Tue, 25 Oct 2022 19:02:03 GMT
Title: Learning in Multi-Player Stochastic Games
Authors: William Brown
Abstract summary: We consider the problem of simultaneous learning in games with many players in the finite-horizon setting. While the typical target solution for a game is a Nash equilibrium, this is intractable with many players. We turn to a different target: algorithms which generate an equilibrium when they are used by all players.
Score: 1.0878040851638
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of simultaneous learning in stochastic games with many players in the finite-horizon setting. While the typical target solution for a stochastic game is a Nash equilibrium, this is intractable with many players. We instead focus on variants of {\it correlated equilibria}, such as those studied for extensive-form games. We begin with a hardness result for the adversarial MDP problem: even for a horizon of 3, obtaining sublinear regret against the best non-stationary policy is \textsf{NP}-hard when both rewards and transitions are adversarial. This implies that convergence to even the weakest natural solution concept -- normal-form coarse correlated equilbrium -- is not possible via black-box reduction to a no-regret algorithm even in stochastic games with constant horizon (unless $\textsf{NP}\subseteq\textsf{BPP}$). Instead, we turn to a different target: algorithms which {\it generate} an equilibrium when they are used by all players. Our main result is algorithm which generates an {\it extensive-form} correlated equilibrium, whose runtime is exponential in the horizon but polynomial in all other parameters. We give a similar algorithm which is polynomial in all parameters for "fast-mixing" stochastic games. We also show a method for efficiently reaching normal-form coarse correlated equilibria in "single-controller" stochastic games which follows the traditional no-regret approach. When shared randomness is available, the two generative algorithms can be extended to give simultaneous regret bounds and converge in the traditional sense.

Related papers

Efficiently Solving Turn-Taking Stochastic Games with Extensive-Form Correlation [52.16923999754027]
We give an algorithm for computing a Stackelberg extensive-form correlated equilibrium. We also give an efficient algorithm for approximately computing an optimal extensive-form correlated equilibrium. Our algorithm for approximately optimal EFCE is, to our knowledge, the first that achieves 3 desiderata simultaneously.
arXiv Detail & Related papers (2024-12-22T09:12:05Z)
Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games. We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix. We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z)
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z)
Scalable and Independent Learning of Nash Equilibrium Policies in $n$-Player Stochastic Games with Unknown Independent Chains [1.0878040851638]
We study games with independent chains and unknown transition matrices. In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players. We propose a fully decentralized mirror descent algorithm to learn an $epsilon$-NE policy.
arXiv Detail & Related papers (2023-12-04T03:04:09Z)
Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. We show that no algorithm attains no-regret in general-sum games when executed independently by all players. We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z)
Global Nash Equilibrium in Non-convex Multi-player Game: Theory and Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game. We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z)
Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z)
An algorithmic solution to the Blotto game using multi-marginal couplings [27.35514815248438]
We describe an efficient algorithm to compute solutions for the general two-player Blotto game on n battlefields with heterogeneous values. Our algorithm samples from an eps-optimal solution in time O(n2 + eps-4) independently of budgets and battlefield values. In the case of asymmetric values where optimal solutions need not exist but Nash equilibria do, our algorithm samples from an eps-Nash equilibrium with similar complexity.
arXiv Detail & Related papers (2022-02-15T11:07:31Z)
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games. In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.