Related papers: Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

URL: http://arxiv.org/abs/2006.08555v2
Date: Thu, 18 Feb 2021 19:26:01 GMT
Title: Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
Authors: Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi
Abstract summary: Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm guaranteed to converge to an approximate Nash equilibrium. We introduce Pipeline PSRO, the first scalable general method for finding approximate Nash equilibria in large games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game complexity tree of $1050$.
Score: 11.866835246140647
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots. Experiment code is available athttps://github.com/JBLanier/pipeline-psro.

Related papers

Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound. Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z)
Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO [10.137357924571262]
Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information.
arXiv Detail & Related papers (2025-02-05T05:48:16Z)
On Tractable $Φ$-Equilibria in Non-Concave Games [53.212133025684224]
Non-concave games introduce significant game-theoretic and optimization challenges. We show that when $Phi$ is finite, there exists an efficient uncoupled learning algorithm that converges to the corresponding $Phi$-equilibria. We also show that Online Gradient Descent can efficiently approximate $Phi$-equilibria in non-trivial regimes.
arXiv Detail & Related papers (2024-03-13T01:51:30Z)
Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z)
Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games [14.979239870856535]
Self-play (SP) is a popular reinforcement learning framework for solving competitive games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
arXiv Detail & Related papers (2023-10-05T07:19:33Z)
Global Nash Equilibrium in Non-convex Multi-player Game: Theory and Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game. We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z)
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z)
Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games [69.5064797859053]
We introduce emphSelf-Play PSRO (SP-PSRO), a method that adds an approximately optimal policy to the population in each iteration. SP-PSRO empirically tends to converge much faster than APSRO and in many games converge in just a few iterations.
arXiv Detail & Related papers (2022-07-13T22:55:51Z)
Efficient Policy Space Response Oracles [61.71849698253696]
Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games. Central to our development is the newly-introduced of minimax optimization on unrestricted-restricted (URR) games. We report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.
arXiv Detail & Related papers (2022-01-28T17:54:45Z)
Anytime Optimal PSRO for Two-Player Zero-Sum Games [17.821479538423155]
Policy Space Response Oracles (PSRO) is a reinforcement learning algorithm for games that can handle continuous actions. AODO is a double oracle algorithm for 2-player zero-sum games that converges to a Nash equilibrium. We show that our methods achieve far lower exploitability than DO and PSRO and never increase exploitability.
arXiv Detail & Related papers (2022-01-19T16:34:11Z)
XDO: A Double Oracle Algorithm for Extensive-Form Games [14.823154995416997]
We propose an extensive-form double oracle algorithm that converges to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. In experiments on a modified Leduc poker game, we show that XDO achieves over 11x lower exploitability than CFR and over 82x lower exploitability than PSRO and XFP.
arXiv Detail & Related papers (2021-03-11T03:05:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.