Related papers: Efficient Policy Space Response Oracles

Efficient Policy Space Response Oracles

URL: http://arxiv.org/abs/2202.00633v1
Date: Fri, 28 Jan 2022 17:54:45 GMT
Title: Efficient Policy Space Response Oracles
Authors: Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu
Abstract summary: Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games. Central to our development is the newly-introduced of minimax optimization on unrestricted-restricted (URR) games. We report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.
Score: 61.71849698253696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games but suffers from two problems: (1) the computation inefficiency due to consistently evaluating current populations by simulations; and (2) the exploration inefficiency due to learning best responses against a fixed meta-strategy at each iteration. In this work, we propose Efficient PSRO (EPSRO) that largely improves the efficiency of the above two steps. Central to our development is the newly-introduced subroutine of minimax optimization on unrestricted-restricted (URR) games. By solving URR at each step, one can evaluate the current game and compute the best response in one forward pass with no need for game simulations. Theoretically, we prove that the solution procedures of EPSRO offer a monotonic improvement on exploitability. Moreover, a desirable property of EPSRO is that it is parallelizable, this allows for efficient exploration in the policy space that induces behavioral diversity. We test EPSRO on three classes of games and report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.

Related papers

Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO [10.137357924571262]
Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information.
arXiv Detail & Related papers (2025-02-05T05:48:16Z)
Self-adaptive PSRO: Towards an Automatic Population-based Game Solver [34.326819257554874]
Policy-Space Response Oracles (PSRO) as a general algorithmic framework has achieved state-of-the-art performance in learning equilibrium policies of two-player zero-sum games. We make the first attempt to investigate the possibility of self-adaptively determining the optimal hyper parameter values in the PSRO framework. Experiments on various two-player zero-sum games demonstrate the superiority of SPSRO over different baselines.
arXiv Detail & Related papers (2024-04-17T07:40:57Z)
ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets. We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z)
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z)
Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games [69.5064797859053]
We introduce emphSelf-Play PSRO (SP-PSRO), a method that adds an approximately optimal policy to the population in each iteration. SP-PSRO empirically tends to converge much faster than APSRO and in many games converge in just a few iterations.
arXiv Detail & Related papers (2022-07-13T22:55:51Z)
Anytime Optimal PSRO for Two-Player Zero-Sum Games [17.821479538423155]
Policy Space Response Oracles (PSRO) is a reinforcement learning algorithm for games that can handle continuous actions. AODO is a double oracle algorithm for 2-player zero-sum games that converges to a Nash equilibrium. We show that our methods achieve far lower exploitability than DO and PSRO and never increase exploitability.
arXiv Detail & Related papers (2022-01-19T16:34:11Z)
DORO: Distributional and Outlier Robust Optimization [98.44757325531631]
We propose the framework of DORO, for Distributional and Outlier Robust Optimization. At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers. We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets.
arXiv Detail & Related papers (2021-06-11T02:59:54Z)
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning [8.819672165548477]
We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a solver. POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution. We demonstrate the effectiveness of POMO by solving three popular NP-hard problems, namely, traveling salesman (TSP), capacitated vehicle routing (CVRP), and 0-1 knapsack (KP)
arXiv Detail & Related papers (2020-10-30T00:57:50Z)
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems. Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games [11.866835246140647]
Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm guaranteed to converge to an approximate Nash equilibrium. We introduce Pipeline PSRO, the first scalable general method for finding approximate Nash equilibria in large games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game complexity tree of $1050$.
arXiv Detail & Related papers (2020-06-15T17:17:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.